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19.  Abstract  (Continueo) 

classes  of  neuronal  signals:  Drives  that  are  defined  to  be  siijr.al  levels  and 
reinforcers  that  are  ocfined  to  be  changes  in  signal  levels.  Defining  drives  and 
rei nforcers  in  this  way,  in  conjunction  with  the  neuronal  riioael ,  suggests  a  basis  tor  a 
neurobiolcgical  theory  of  learning.  The  proposed  neuronal  model  is  an  extension  of  the 
Sutton-Barto  (1981)  model  which,  in  turn,  can  be  seen  as  a  temporally  refined  extension 
of  the  Rescorla-Wagner  (1972)  model.  ■  It  is  showr:  that  the  proposed  neuronal  model 
predicts  the  basic  categories  of  classical  cor.aitiornng  phenomena  including  delay  and 
trace  conditioning,  conditioned  and  unconditioned  stin.ulus  duration  and  amplitude 
effects,  partial  reinforcement  effects,  interstimulus  interval  effects  including 
simultaneous  conditioning,  second-order  conditioning,  conditioned  inhibition, 
extinction,  reacquisition  eftects,  backward  conditioning,  blocking,  overshadowing, 
compound  conditioning,  arid  discriminative  stimulus  effects.  ^The  neurorial  model  also 
eliminates  some  inconsistencies  with  the  experimental  evide/ice  that  occur  with  the 
Rescorla-Wagner  and  Sutton-Barto  models.  Inipl icaciuns  of  the  neuronal  mooel  for 
animal  learning  theory,  connectionist  and  neural  network  modeling,  artificial 
intelligence,  adaptive  control  theory,  and  adaptive  signal  processing  are  discussed. 

It  is  concluded  that  real-time  learning  mechanisms  that  do  not  require  evaluative 
feedback  from  the  environment  are  fundamental  to  natural  intelligence  and  may  have 
implications  for  artificial  intelligence.  Experimental  tests  of  the  model  are 
suggested. 
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SECTION  1 


INTRODUCTION 

Pdvlov  (1927)  and  Hebb  (1949)  were  among  the  first 

investigators  to  extensively  analyze  possible  relationships  between 
the  behavior  of  whole  animals  and  the  behavior  of  single  neurons. 
Building  on  Pavlov's  experimental  foundation,  Hebb's  theoretical 
analyses  lea  him  to  a  model  of  single  neuron  function  that  continues 
to  be  relevant  to  the  theoretical  and  experimental  issues  of 
learning  and  memory.  There  had  been  earlier  attenipts  to  develop 
such  neuronal  models.  Among  them  were  the  models  of  Freud  (1895), 
Rashevsky  (1938)  and  hcCulloch  and  Pitts  (1943)  but,  to  this  day, 
the  neuronal  model  proposed  by  Hebb  has  remained  the  most 
influential  among  theorists.  Current  theorists  who  have  utilized 
variants  of  the  Hebbian  model  include  Anderson,  Silverman,  Ritz,  and 
Jones  (1977),  Kohonen  (1977),  Grossberg  (1982),  Levy  and  Desmond 
(1985),  Hopfield  and  Tank  (1986),  and  Rolls  (1987). 

In  this  report,  1  will  suggest  several  modifications  to  the 
Hebbian  neuronal  model.  The  modifications  yield  a  model  which 
will  be  shown  to  be  more  nearly  in  accoro  with  anitiial  learning 
phenomena  that  are  observed  experimentally.  The  niodel  to  be 
proposed  is  an  extension  of  the  Sutton-Barto  (1981)  model. 

After  defining  the  neuronal  model,  first  qualitatively  and 
then  mathematically,  I  will  show,  by  means  of  computer 

simulations,  that  the  neuronal  model  predicts  the  basic  categories  of 
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classical  conditioning  phenomena.  Then,  I  will  oiscuss  the  neuronal 
model  in  more  general  theoretical  terms,  with  particular  reference  to  the 
psychological  notions  of  drives  and  reinforcers.  My  conclusion  will  be 
that  the  model  offers  a  way  of  defining  drives  arid  reinforcers  at  a 
neuronal  level  such  that  a  neurobiological  basis  is  suggested  for  animal 
learning.  In  the  theoretical  context  that  the  neuronal  model  provides,  I 
will  suggest  that  drives ,  in  their  most  general  sense,  are  simply  signal 
levels  in  the  nervous  system  and  reinforcers ,  in  their  most  general 
sense,  are  simply  changes  in  signal  levels.  This  seems  too  simple  and, 
indeea,  it  is  -  but  I  hope  to  show  that  it  is  not  that  much  too  simple. 
I  will  attempt  to  make  a  case  for  drives  and  reinforcers  being  viewed,  in 
their  essence,  as  signal  levels  in  the  nervous  system  and  as  changes  in 
signal  levels,  respectively.  The  result  will  be  a  theoretical  framework 
based  on  what  I  propose  to  call  a  drive-reinforcement  model  of  single 
neuron  function. 
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SECTION  2 


THE  NEURONAL  MODEL 

Qualitative  Ofcscrlption 

I  will  begin  by  defining  the  drive-reinforcement  neuronal 
model  in  qualitative  terms.  It  will  be  easiest  to  do  this  by 
contrasting  the  model  with  the  Hebbian  model.  Hebb  (1949) 
suggested  that  the  efficacy  of  a  plastic  synapse  increases 
whenever  the  synapse  is  active  in  conjunction  with  activity  of  the 
postsynaptic  neuron.  Thus,  Hebb  was  proposing  that  learning  (i.e., 
changes  in  the  efficacy  of  synapses)  is  a  function  of  correlations 
between  approximately  simultaneous  pre-  and  postsynaptic  levels  of 
tieuronal  activity. 

I  wish  to  suggest  three  modifications  to  the  Hebbian  model: 

(a)  Instead  of  correlating  pre-  and  postsynaptic  levels  of 
activity,  changes  in  presynaptic  levels  of  activity 
should  be  correlated  with  changes  in  postsyr.aptic  levels 
of  activity.  In  other  words,  instead  of  correlating 
signal  levels  on  the  input  and  output  sides  of  the 
neuron,  the  first  derivatives  of  the  input  and  output 
signal  levels  should  be  correlated. 

(b)  Instead  of  correlating  approximately  simultaneous 

pre-  and  postsynaptic  sigrial  levels,  earlier  presynaptic 
signal  levels  should  be  correlated  with  later 
postsynaptic  signal  levels.  More  precisely  and 
consistent  with  (a),  earlier  changes  in  presynaptic  signal 


levels  should  be  correlatec  with  later  changes  in  postsynaptic 
signal  levels.  Thus,  sequentiality  replaces  simultaneity  in 
the  model.  The  interval  between  correlated  changes  in  pre-  and 
postsynaptic  signal  levels  is  suggested  to  range  up  to  that  of 
the  maximum  effective  interstimulus  interval  in  delay 
conditioning. 

(c)  A  change  in  the  efficacy  of  a  synapse  should  be 

proportional  to  the  current  efficacy  of  the  synapse, 
accounting  for  the  initial  positive  acceleration  in  the 
s-shaped  acquisition  curves  observed  in  animal 
learning. 

A  refinement  of  the  model  will  be  noted  now  and  discussed 
more  fully  later.  The  ability  of  the  neuronal  model  to  predict 
animal  learning  phenomena  is  improved  if,  instead  of  correlating 
positive  and  negative  changes  in  neuronal  inputs  with  changes  in 
neuronal  outputs,  only  positive  changes  in  inputs  are  correlated 

with  changes  in  outputs.  To  clarify  this,  positive  changes  in 
inputs  refer  to  increases  in  the  frequency  of  action  potentials  at 
a  synapse,  whether  the  synapse  is  excitatory  or  inhibitory. 
Negative  changes  in  inputs  refer  to  decreases  in  the  frequency  of 
action  potentials  at  a  synapse,  whether  the  synapse  is  excitatory 

or  inhibitory.  Furthermore,  the  changes  in  frequencies  of  action 

potentials  I'm  referring  to  will  be  relatively  abrupt,  occurring 
within  about  a  second  or  less.  It  is  hypothesized  that  niore 
gradual  and  long-term  changes  in  the  frequency  of  action 
potentials  at  a  synapse  do  not  trigger  the  neuronal  learr.ing  mechanism. 
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Atter  the  neuronal  model  has  been  defined  precisely  and  the  results 
of  computer  simulations  have  been  presented,  it  will  be  seen  that  th.is 
model  of  neuronal  function  bears  the  following  relationship  to  models  of 
whole  animal  behavior.  In  general,  changes  in  presynaptic  frequencies  of 
firing  will  reflect  the  onsets  and  offsets  of  conditioned  stimuli.  Iri 
general,  changes  in  postsynaptic  frequencies  of  firing  will  reflect 
increases  or  decreases  in  levels  of  drives  (with  orives  being  defined 
more  broadly  than  has  been  customary  in  the  past).  In  the  case  of  the 
neuronal  model,  changes  in  the  levels  of  drives  (which  will  usually 
manifest  as  changes  in  postsynaptic  frequencies  of  firing)  will  be 
associated  with  reinforcement.  With  regard  to  the  behavior  ot  whole 
animals,  the  notion  that  changes  in  orive  levels  constitute  reinforcement 
has  been  a  funoamental  part  of  animal  learning  theory  since  the  time  of 
Hull  (1943)  and  Mowrer  (1960).  here,  I  am  taking  the  notion  down  to  the 
level  of  the  single  neuron.  Changes  in  signal  levels,  which  play  a 
fundamental  role  in  the  neuronal  model  being  proposed,  have  long  been 
recognized  to  be  of  importance.  For  example,  Berlyne  (1973,  p.  16)  notes 
that  "many  recent  theorists  have  been  led  from  different  starting  points 
to  the  conclusion  that  hedonic  value  is  dependent  above  all  on  changes  in 
level  of  stimulation  or  level  of  activity.  They  include  McClelland, 
Atkinson,  Clark  and  Lowell  (1953),  Premack  (1959),  Nelson  (1964),  ana 
Fowler  (1971)." 

Before  concluding  this  introduction  to  the  dri ve-reinforcement 
neuronal  model,  it  will  be  useful  to  briefly  note  how  the  model  relates 
to  earlier  models  from  which  it  derives.  The  derivation  and  evolution  of 
the  model  will  be  discussed  more  fully  later.  As  has  already  been 


indicated,  the  drive-reinforcement  model  is  an  extension  of  the 
Sutton-Barto  (1981)  model.  The  Sutton-Barto  model,  in  turn,  can  be 
viewed  as  a  temporally  refined  extension  of  the  Rescorla-Wagner  (1972) 
model.  I  v/ill  show  that  the  drive-reinforcement  model  eliminates  some 
shortcomings  of  the  Rescorla-Viagner  and  Sutton-Barto  models.  Both  of  the 
latter  models  predict  strictly  negatively  accelerated  acquisition  or 
learning  curves.  The  Rescorla-Wagner  model  also  predicts  extinction  of 
conditioned  inhibition.  Consistent  with  the  experimental  evidence,  it 
will  be  seen  below  that  the  drive-reinforcement  model  predicts  (a)  an 
acquisition  curve  that  is  initially  positively  accelerating  and 
subsequently  negatively  accelerating  and  (b)  conditioned  inhibition  that 
does  not  extinguish.  In  addition,  the  drive-reinforcenient  model  solves 
some  problems  with  conditioned  stimulus  duration  effects  that  arise  in 
the  case  of  the  Sutton-Barto  model. 


Mathematical  specification 

The  proposed  neuronal  model  may  be  defined  precisely  as  follows. 
The  input-output  relationship  of  a  neuron  will  be  modeled  in  a  fashion 
that  is  customary  among  neural  network  modelers.  Namely,  it  will  be 
assumed  that  single  neurons  are  forming  weighted  sums  of  their  excitatory 
and  inhibitory  inputs  and  then,  if  the  sum  equals  or  exceeds  the  thres¬ 
hold,  the  neuron  fires.  Such  a  model  of  a  neuron's  input-output 
relationship  can  be  based  on  the  view  that  neuronal  signals  are  binary 
(either  a  neuron  fires  or  it  doesn't)  or  on  the  view  that  neuronal 
signals  are  real-valued  (reflecting  some  measure  of  the  frequency  of 


firing  neurons  as  a  function  of  the  amount  by  which  tfie  neuronal 
threshold  is  exceeded).  Here,  the  latter  view  will  be  adopted.  Neuronal 
input  and  output  signals  will  be  treated  as  frequencies.  This  approach 
to  modeling  neuronal  input-output  relationships  is  consistent  with 
experimental  evidence  reviewed  by  Calvin  (1975). 

Mathematically,  then,  the  neuronal  input-output  relatiorishi p  may  be 


specified  as  follows; 


y(t)=z  w  (t)x  (t)-  0 
1=  1  1  ^ 


(1) 


where  y(t)  is  a  measure  of  the  postsynaptic  frequency  of  firing  at 


discrete  time  t;  n  is  the  number  of  synapses  impinging  on  the  neuron; 


.th 


w^(t)  is  the  efficacy  of  the  i  synapse;  x^(t)  is  a  measure  of  the 


th 


frequency  of  action  poteritials  at  the  i  synapse  and  9  is  the  neuronal 


threshold.  The  synaptic  efficacy,  w^(t),  can  be  positive  or  negative, 


correspondit)g  to  excitatory  or  inhibitory  synapses,  respectively.  Also, 
y{t)  is  bounded  such  that  y(t)  is  greater  than  or  equal  to  zero  and  less 
than  or  equal  to  the  maximal  output  frequency,  y'{t),  of  the  neuron. 
Negative  values  of  y(t)  have  no  meaning  as  they  would  correspond  to 
negative  frequencies  of  firing. 

To  complete  the  mathematical  specification  of  the  neuronal  model, 
the  learning  mechanism  described  earlier  in  qualitative  terms  reii,dins  to 
be  presented.  The  learning  niechanisni  may  be  specified  as  follows: 


AW.(t)=  Ay(t)  Z  C  |w.(t-j)j  AX,(t-j) 
1  i=l  j'  1  1 


(2) 


where  Aw  (t)=w  (t+l)-w  (t),  Ay(t)=y(t)-y(t-l ) ,  and 
i  i  i 


AX  (t-j)=x  (t-j)-x  (t-j-1).  aw  (t)  represents  the  change  in  the 
i  1  i^h  ^ 

efficacy  of  the  i  synapse  at  time  t,  yielding  the  adjusted  or 


new  efficacy  of  the  synapse  at  time  t+1.  AX^(t-j)  represents  a 


•■Tv 


presynaptit  change  in  signal  level  at  time,  t-j ,  and  Ay(t)  represents 

the  postsynaptic  change  in  signal  level  at  tinie  t.  x  is  the  longest 

interstimulus  interval,  measured  in  discrete  time  steps,  over  which  delay 

conditioning  is  effective  and  c.  is  an  empirically  established  learning 

<3 

rate  constant  which  is  proportional  to  the  efficacy  of  conditioning  when 
the  interstimulus  interval  is  j.  The  remaining  symbols  are  defined  as  in 
equation  (1).  A  aiagram  of  the  neuron  moaeled  by  equations  (1)  and  {Z) 
IS  shown  in  Figure  1. 

Generally,  in  interpreting  and  working  with  equation  (2),  I  have 

adopted  the  following  assumptions,  consistent  with  what  is  known  of 

learning  involving  the  skeletal  reflexes.  I  usually  consider  each 

discrete  time  step,  t,  to  be  equal  to  one-half  secono.  This  is  a 

meaningful  interval  over  which  to  obtair.  measures  of  the  pre-  and 

postsynaptic  frequencies  of  firing,  x.(t)  and  y(t).  Also,  it  is  probably 

a  reasonable  interval  of  time  with  respect  to  the  learning  processes 

underlying  changes  in  synaptic  efficacy.  For  example,  the  optimal 

interstimulus  interval  for  classically  conditioning  a  skeletal  reflex  is 

nominally  one-half  second  [optimal  interstimulus  intervals  vary  from 

about  200  to  500  ms  depending  on  the  species  arid  the  response  system 

within  the  species  (see  review  by  Woody,  1982)],  eno  very  little  or  no 

conditioning  is  observed  with  intervals  approaching  zero  or  exceeding 

three  seconds  (Frey  and  Poss,  1968;  McAllister,  1953;  Russell,  1966; 

Moore  and  Gormezano,  1977).  Thus,  in  equation  (2),  indexing  starts  with 

j  equal  to  1  because  c  is  equal  to  zero,  reflecting  the  tact  tiiat  no 

0 

conditiofiing  is  observed  with  at:  interstimulus  interval  of  zero,  c^  is 
assigned  the  maximal  value  reflecting  the  fact  that  one-half  second  is 


Figure  1.  A  model  of  a  single  neuron  with  n  synapses.  Presynaptic 
frequencies  of  firing  are  represented  by  x  (t),  synaptic  efficacies  by 
w^(t),  and  the  postsyr.aptic  frequency  of  firing  by  y(t).  The 
input-output  (I/O)  relationship  is  specified  by  equation  (1)  and  the 
learning  mechanism  (L.M.)  is  specified  by  equation  (2)  in  the  text. 


(approximately)  the  optimal  interstiniulus  interval.  Then,  c  is  less 

j+1 

than  c.  for  the  remairiing  c-values,  reflectino  the  decreasing  efficacy  of 
J 

conoitioning  as  the  interstimulus  interval  increases  beyond  one-half 
second.  t  is  normally  set  equal  to  5  because,  when  j  equals  6 
(corresponding  to  an  interstimulus  interval  of  three  seconds),  little  or 

no  conditioning  would  occur  so  c  would  be  approximately  equal  to  zero. 

6 

A  lower  bound  is  set  on  the  absolute  values  of  the  synaptic  weights, 
w^(t).  The  bound  is  near  but  not  equal  to  zero  because  synaptic  weights 
appear  as  factors  on  the  right  side  of  equation  (2).  It  can  be  seen  that 
the  learning  mechanism  woulo  cease  to  yield  changes  in  synaptic  efficacy 
fur  any  synapse  whose  efficacy  reached  zero;  i.e.,  Aw^(t)  would 
henceforth  always  equal  zero.  A  lower  bound  on  the  absolute  values  of 
synaptic  weights  results  in  excitatory  weights  always  remaining 
excitatory  (positive)  and  inhibitory  weights  always  remaining  inhibitory 
(negative);  i.e.,  synaptic  weights  do  not  cross  zero.  This  is  consistent 
with  the  known  physiology  of  synapses  (Eccles,  1964).  A  nonzero  lower 
bound  on  the  efficacy  of  synapses  is  also  consistent  with  evidence 
suggesting  that  potential  conditioneo  stimuli  are  weakly  connected  to 
unconditioned  responses  prior  to  conditioning  (Goulo,  1986;  Schwartz, 
1978;  Pavlov,  1927).  Also,  a  nonzero  lower  bound  on  the  efficacy  of 
synapses  models  the  notion  that  a  synapse  must  have  some  effect  on  the 
postsynaptic  neuron  in  order  for  the  postsynaptic  learning  mecfienism  to 
be  triggered.  That  learning  mechanisms  are  postsynaptic,  at  least  in 
phylogenetically  advanced  organisms,  has  been  well  argued  by  McKaughton, 
Barnes,  and  Rao  (1984).  In  the  case  of  the  mammalian  central  nervous 
system,  Thompson,  McCormick,  Lavond,  Clark,  Kettner,  and  Mauk  (1963)  note 


that  what  little  evidence  now  exists  is  perhaps  More  consistent  with  the 
hypothesis  of  postsynaptic  rather  than  presynaptic  learning  mechatnsms. 

In  general,  it  is  expected  that  the  efficacy  of  synapses,  w^(t),  is 
variable  and  under  the  control  of  the  neuronal  learning  mechanism. 
However,  some  synapses  can  be  expected  to  have  fixeo  weights;  i.e., 
weights  that  are  innate  and  unchangeable.  This  may  be  true  for  many  or 
most  synapses  in  the  autonomic  titrvuus  system.  In  the  somatic  nervous 
system,  it  is  likely  that  many  more  synapses  and  perhaps  most  are 

variable  or  "plastic".  In  the  case  of  the  drive-reinforcement  neuronal 
model,  it  will  be  assumed  that  synapses  mediating  conditioned  stimuli 
have  variable  weights  and  that  synapses  mediating  unconditioneo  stimuli 
have  fixec  weights.  The  innately  specified  synaptic  weights  that  are 
assumed  to  mediate  unconoitioned  stimuli  are  expected  to  reflect  the 

evolutionary  history  of  the  organism. 

Let  us  now  consider  what  is  happening  in  equatioti  (2).  As  the 

specification  of  the  learning  mechanism  for  the  dri ve-reiriforcement 

neuronal  model,  equation  (2)  suggests  how  the  efficacy  of  a  synapse 

changes  as  a  function  of  four  factors:  (1)  learning  rate  constants,  c  , 

j 

that  are  assumed  to  be  innate;  (2)  the  absolute  value,  Iw  (t-j)i  ,  of  the 

'  1  ' 

efficacy  of  the  synapse  at  time,  t-j ,  when  the  change  in  presynaptic 

level  of  activity  occurred;  (3)  the  change  in  presynaptic  level  of 

act’v/ity,  Ax^(t-j);  and  (4)  the  change  in  postsynaptic  level  of 
activity,  Ay(t). 

One  way  of  visualizing  either  the  Hebbian  or  the  dri ve-reinfurceinent 
learning  niechanism  is  in  ternis  of  a  temporal  window  that  slides  along  the 


time  litie  as  leornlng  occurs,  changing  the  efficacy  of  synapses  as  it 
moves  along.  In  the  case  of  the  Hebbian  model,  the  learning  mechanism 
employs  a  temporal  window  that  is,  iri  effect,  only  one  time  step  wide. 
The  learning  mechanism  slides  along  the  time  line,  modifying  the  efficacy 
of  synapses  proportional  to  (1)  a  learning  rate  constatit,  {Z)  the 
presynaptic  level  of  activity,  and  (3)  the  postsynaptic  level  of 
activity.  (The  Hebbian  model  will  be  presented  in  mathematical  form 
later.)  In  the  case  of  the  drive-reinforcement  model,  the  learning 
mechanism  employs  a  temporal  window  that  is  t+1  time  steps  wide.  The 
learning  niechanism  slides  along  the  time  line  modifying  the  efficacy  of 
synapses  proportional  to  (1)  learning  rate  constants,  (2)  the  efficacy  of 
synapses,  (3)  changes  in  presynaptic  levels  of  activity  and  (4)  changes 
in  postsynaptic  levels  of  activity,  it  can  be  seen  that  the  Hebbian 
learning  mechanism  correlates  approximately  simultaneous  signal  levels 
and  the  drive-reinforcement  learning  rnechanisni  correlates  temporally 
separated  derivatives  of  signal  levels.  (In  the  case  of  the  drive- 
reinforcement  iTiodel ,  I  am  not  suggesting  that  a  neuron  would  have  to 
compute  ariything  as  refined  as  a  first  derivative.  A  first-order 
difference  will  suffice,  as  will  be  oemonstrated  later.)  Ihe  differences 
in  the  behavior  of  the  hebbian  and  the  ari ve-reinforcement  learning 
iiiechanisms  will  be  examir.ed  below  when  the  results  of  computer 
simulations  of  both  models  are  presented. 

Properties  of  the  model 

The  drive-reinforcement  neuronal  model  suggests  that  what  neurons 


are  learning  to  do  is  to  anticipate  or  predict  the  onsets  and  offsets  of 
pulse  trains.  By  pulse  trains,  I  mean  sequences  or  clusters  of  action 


potentials  in  axons.  The  model  neuron  learns  to  predict  the  orisets  and 
offsets  of  pulse  trains  representing  unconditioned  stimuli,  utilizing  the 
onsets  of  pulse  trains  representitig  conditioned  stimuli.  This  will 
become  evident  when  the  results  of  computer  simulations  are  presented. 
It  will  be  seen  that  the  learning  mechanism  moves  the  onsets  and  offsets 
of  pulse  trains  to  earlier  points  in  time.  Fundamentally,  the  learning 
mechanism  is  a  shaper  of  pulse  trains.  The  efficacy  of  a  synapse  changes 
in  a  direction  such  that  the  neuron  comes  to  anticipate  the  unconditioned 
response;  i.e.,  the  conditioned  stimulus  comes  to  produce  the  conditioned 
response  prior  to  the  occurrence  of  the  unconditioned  stimulus  and  the 
unconditioned  response.  The  way  the  drive-reinforcement  neuronal 
learning  mechanism  shapes  pulse  trains  is  illustrated  in  Figure  P.  Many 
investigators,  including  Pavlov  (1927),  have  pointed  to  the  anticipatory 
or  predictive  nature  ot  conditioning  phenomena  [e.g.,  see  Kamin  (1968, 
1969),  Rescorla  and  Wagner  (1972),  Dickinson  and  Mackintosh  (1978),  and 
Sutton  and  Barto  (1981)]. 

Refinement  of  the  mooel 

The  drive-reinforcement  neuronal  leerriing  mechanism,  as  defined  by 
equation  (2),  can  be  refineo  in  a  way  that  improves  the  model's  ability 
to  predict  animal  learning  phenomena.  The  refinement,  as  briefly  rioted 
earlier,  involves  allowing  only  positive  changes  In  presynaptic  signal 
levels  to  trigger  the  neuronal  learning  mechanism.  In  other  words, 
Ax^(t-j)  must  be  greater  than  zero.  If  AX.(t-j)  is  less  than  zero,  it 
is  then  set  equal  to  zero  for  the  purpose  of  calculating  Aw^(t)  in 
equation  (2). 


(A)  ONSETS 


(B)  OFFSETS 


US 


CR/UR:  BEFORE 
LEARNING 


CR/UR:  AFTER 
LEARNING 


Figure  2.  Examples  of  how  the  drive-reinforcement  learning  mechanism 
alters  the  onsets  and  offsets  of  pulse  trains  for  a  single  theoretical 
neuron.  Panels  (a)  and  (b)  show  the  effects  of  unconditioned  stimulus 
onset  and  offset,  respectively.  In  each  example,  the  conditioned 
stimulus  (CS)  is  followed  by  an  unconditioned  stimulus  (US),  both  of 
which  represent  presynaptic  signals.  The  two  presynaptic  signals  are 
assunied  to  be  mediated  by  separate  synapses,  with  the  CS-mediating 
synapse  having  a  variable  efficacy  (weight)  under  the  control  of  the 
neuronal  learning  mechanism.  The  conditioned  and  unconditioned  response 
(CR  and  UR)  before  and  after  learning  (i.e.,  before  and  after  a  number  of 
presentations  of  the  CS-US  pair)  are  shown  below  the  wove  forms  for  the 
CS  and  US  pulse  trains.  The  conditioned  and  unconditioned  response 
(CR/UR)  represents  the  postsynaptic  frequency  of  firing  of  the  neuron. 
In  panels  (a)  and  (b),  it  is  seen  that  the  onset  and  offset  of  firing, 
respectively,  occurs  earlier  in  time  after  learning.  Thus,  in  each 
case,  the  neuron  has  learned  to  ar’.ticipate  the  unconditioned  response  by 
learning  to  start  firing  earlier  (panel  a)  or  stop  firing  earlier 
(panel  b). 


There  is  tin  intuitive  basis  for  this  refinement.  A  negative  change 
in  presynaptic  signal  level  means  that  the  presynaptic  signal  is  falling 
away;  i.e.,  that  it  is  headed  toward  zero.  If  such  a  negative  change  in 
presynaptic  signal  level  were  to  trigger  the  neuronal  learning  mechanism 
and  possibly  cause  a  synaptic  weight  to  change,  then  a  synaptic  weight 
woulo  have  changed  for  a  synapse  that  just  ceased  to  carry  the  signal 
that  caused  the  change.  That  is  to  say,  the  relevant  part  of  the  signal 
on  which  the  synaptic  weight  should  operate  woulo  no  longer  be  present. 
Some  residual  portion  of  the  signal  might  still  be  present  after  the 
negative  change  in  presynaptic  signal  level.  However,  the  residual 
portion  of  the  signal  is  not  relevant  because  it  might  have  been  there 
long  before  the  negative  change  in  presynaptic  signal  level  and  might  be 
there  long  afterward.  With  the  drive-reinforcemerit  neuronal  learning 
mechanism,  only  the  dynamic  part  of  the  signal  is  relevant,  as  will  be 
more  clearly  seen  after  the  computer  simulations  are  presented.  This  is 
not  to  suggest  that  a  drive-reinforcement  learning  mechanism  would 
precluoe  learning  about  negotive  changes  in  levels  of  stimuli.  However, 
if  such  changes  are  to  trigger  a  drive-reinforcement  learning  mechanism, 
It  is  suggested  that  they  would  have  to  be,  in  effect,  inverted,  such 
that  they  would  manifest  in  some  part  of  the  nervous  system  as  positive 
changes  in  signal  levels. 

Allowing  only  positive  changes  in  presynaptic  signal  levels  to 
trigger  the  neuronal  learning  mechanism  is  part  of  a  strategy  of  not 
changing  a  synaptic  weight  unless  ti;ere  is  good  reason  to  believe  the 
weight  change  will  be  useful.  Such  a  strategy  seems  reasonable  because, 
in  a  neural  network,  there  is  always  the  possibility  that  a  syr;aptic 


weight  change  will  interfere  with  or  cor.i.titute  overwriting  of  a  previous 
weight  change.  Thus,  weight  changes  are  to  be  minirnizto. 

The  rationale  offerea  above  for  refining  the  learning  mechanism  does 
not  constitute  a  rigorous  argument.  However,  it  is  hoped  that  the 
rationale  provides  sonie  insight  into  why  the  refinement  might  make  sense. 
Later,  a  more  rigorous  approach  will  be  taken.  It  will  be  shown  that  the 
basic  categories  of  classical  conditioning  phenomena  are  predicted  by  the 
neuronal  model  when  only  positive  changes  in  presynaptic  signal  levels 
are  allowed  to  trigger  the  learning  mechanism.  Then,  it  will  be  shown 
how  the  model's  predictions  deviate  from  the  experimental  evidence  wh.eri 
both  positive  and  negative  changes  in  presynaptic  signal  levels  can 
trigger  changes  in  synaptic  weights. 

Derivation  and  evolution  of  the  drive-reinforcement  moael  from  e.rlier 
models 

Having  defi.ied  the  neuronal  model  in  qualitative  and  mathematical 
terms,  I  will  now  describe  the  model's  derivation  and  evolution  from 
earlier  neuronal  models.  The  neuronal  learning  mechanisms  that  have  been 
proposed,  leading  to  the  drive-reinforcement  model,  will  be  portrayed  in 
two  ways:  (1)  by  means  of  the  sequence  of  critical  events  that  have  been 
hypothesized  to  lead  to  learning  and  (Z)  by  ineons  of  the  equation  that 
characterizes  the  learning  mechanism.  As  it  is  customary  to  number 
equations,  I  will  also  number  the  critical  event  sequences  so  that  I  can 
refer  to  them  later.  To  distinguish  them  from  the  equation  numbers,  an 
"S"  will  be  added  as  a  prefix  to  the  critical  event  sequence  numbers. 
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Hebb  suggested  that  the  sequence  oi  critical  events  for  learning  was 
simple: 

x.(t)  ->-y(t)  ->Aw.(t)  (S-1) 

In  other  words,  presynaptic  activity,  x^(t),  followed  directly  by 
postsyriaptic  activity,  y(t),  was  hypothesized  to  result  in  a  change, 
Aw^.(t),  in  the  efficacy  of  the  associated  synapse.  (The  convention 
adopted  in  this  report  is  that  when  presynaptic  activity,  is  a  direct 
cause  of  postsynaptic  activity,  y,  then  and  y  will  have  the  same  time 
step,  t,  associated  with  them.)  The  equation  for  the  Hebbian  learning 
mechanism  may  be  written  as  follows: 

Aw.(t)=cx.{t)y(t)  (3) 

where  c  is  a  learning  rate  constant  and  the  other  symbols  are  as 
defined  earlier. 

Hebb's  model  is  an  example  of  a  simple  real-time  learning  mechanism. 
Real-time  learning  mechanisms  emphasize  the  temporal  association  of 
signals:  each  critical  event  in  the  sequence  leading  to  learning  has  a 
time  of  occurrence  associated  with  it  and  this  time  plays  a  fundamental 
role  in  the  computations  that  yield  changes  in  the  efficacy  of  synapses. 
It  should  be  noted  that  "real-time",  in  this  context,  does  not  meati 
continuous  time  as  contrasted  with  discrete  time  nor  does  it  refer  to  a 
learning  system's  ability  to  accomplish  its  conputations  at  a  sufficient 
speed  to  keep  pace  with  the  environment  within  which  it  is  embedded. 
Rather,  a  real-time  learning  mechanism,  as  defined  here,  is  one  for  which 
the  time  of  occurrence  of  c-aCh  critical  event  in  the  sequence  leading  to 
learning  is  of  fundamer.tal  importance  with  respect  to  the  computations 
the  learning  mechanism  is  performing.  Real-time  learning  mechanisms  nio> 


be  contrasted  with  nonreal -time  learning  mechanisms  such  as  the 
perceptron  (P.cseriblatt ,  1962),  adallne  (Widrow,  1962),  or  back 
propacjatlon  (Werbos,  1974;  Parker,  1982,  1985;  Le  Cun,  1985;  Kumelhart, 
Hinton,  and  Williams,  1985,  1986)  learr.lruj  mechanisms  for  which  error 
signals  lollow  system  responses  and  only  the  order  of  the  inputs, 
outputs,  and  error  signals  is  important,  not  the  exact  time  cf  occurrence 
of  each  signal,  relative  to  the  others.  For  additional  discussions  of 
real-time  learning  mechanism  models,  see  Klopf  (1972,  1975,  1979,  1982, 
1986),  Moore  and  Stickney  (1980),  Sutton  and  Barto  (1981,  1987),  Wagner 
(1981),  Grossberg  (1982,  1987),  Schmajuk  and  Moore  (1985),  Gelperin, 
Hopfielc,  and  Tank  (1985),  Blazis,  Desmond,  Moore,  and  Berthier  (1986), 
Tesauro  (1986),  acrd  Donegan  and  Wagner  (1987).  Proposals  for  real-time 
models  that  give  especially  careful  attention  to  neurobiological 
constraints  are  those  of  HawkiriS  and  Kandel  (1984)  and  Gluck  and 
Thompson  (1987). 

Klopf  (1972,  1982)  proposed  an  extension  to  Hebb's  model  that 
introduced  the  notions  of  synaptic  eligibility  and  reinforcement  into 
real-time  learning  mechanisms,  resulting  in  a  neuronal  model  that 
emphasized  sequential  rather  thari  simultaneous  events.  The  following 
sequence  of  critical  events  was  hypothesized  to  lead  to  learning: 

x^(t-k)  -"y(t-k)  -*s(t)  ^Awjt)  (S-B) 

where  s(t)  is  the  sum  of  the  weighted  inputs  to  the  neuron  at  tinie  t  and 
k  is  the  nominal  interval  of  tirne  required  for  a  neuronal  output  to  feed 
back  and  influence  the  neuronal  input,  the  feedback  occurring  either 
through  the  remainder  of  the  neural  network  or  through  the  environment. 
The  variable  s(t)  represents  the  neuronal  membrane  potential.  Iri  this 
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model,  presyndptic  and  postsynaptic  dCtivity,  x  (t-k)  and  y(t:-k), 
they  occur  in  conjunction,  render  a  synapse  el  1  glble  fer  iiiodi  f  i call  on . 
However,  the  efficacy  of  an  eligible  synapse  does  nut  change  unless  the 
subsequent  membrane  potential,  s(t),  is  nonzero,  s(t)  futictiomf.g  as  a 
reinforcer  that  follows  the  eligibility  computation.  The  equaliun  for 
the  learning  mechanisni  is  as  follows; 

Aw^(t)=cx^{t-k)y^{t-k)s(t)  (4) 

In  the  context  of  real-time  learning  ruechanisms,  the  notions  ct 
synaptic  eligibility  ana  reinforcement  based  on  sequential  rather  than 
simultaneous  events  yielded  a  neuronal  model  that  could  make  greater 
coiitdct  with  the  experimental  evidence  of  classical  and  instrun.ental 
conditioning  (Klopf,  1972,  1982).  A  further  step  was  taken  in  that 
direction  when  Barto  and  Sutton  (1981a)  discovered  that  replacing  s(t;  in 
sequence  (S-2)  above  with  As(t)  permitted  the  neuronal  r.iodel  to  make 
iiiuch  more  substantial  contact  with  classical  conditioning  phenomena.  Tire 
resulting  neuronal  learning  mechanism  is  described  by  the  foUowirig 
critical  event  sequence: 

x.(t-k)  ^y(t-k)  -*As(t)  ^-Aw.Ct)  (S-3) 

where  As(t)  =  s(t)  -  s(t-l). 

The  equation  fur'  the  learning  mechanism  is: 

Aw. (t)=cx^ (t-k)y(t-k)  As(t)  (b) 

This  form  of  learning  mechanism  led  to  a  simplification.  Barto  and 
Sutton  (198ia)  found  that  the  critical  event  sequence  (S-3)  could  be 
replaced  with  the  following  simpler  sequence: 

x.(t-k)  -*Ay(t)  ->Aw.(t)  (S-4) 

Ay(t)  in  sequence  (S-4)  replaces  .’.s(t)  in  sequence  (S-3).  This 
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can  be  seen  to  be  plausible  in  that  Ay(t)  implies  As(t).  However, 
proceeding  from  sequence  {S-3)  to  sequence  (5-4)  involved  the  additional 
discovery  that  y(t-k)  in  sequence  (5-3)  was  not  essential  for  predicting 
classical  conditioning  phenomena.  The  result  was  a  neuronal  model  that 
can  be  specifiec  by  the  following  equation: 

Aw^ (t)=cx^ (t-k)  Ay(t)  (6) 

Actually,  the  form  the  model  took  in  the  computer  simulations  Sutton  and 
Barto  (1981)  reported  was  as  follows: 

Aw.(t)=cx.(t)  Ay(t)  (7) 

1  1 

where 

x.{t)=  ax.(t-l)+x.(t-l)  (8) 

1  1  1 

In  equatiori  (8),  ciis  a  positive  constant.  It  can  be  seen  that  equation 
(7)  is  of  a  form  similar  to  equation  (6)  except  that  x^(t-k)  is  replaced 
by  x^(t).  x^(t)  represents  an  exponentially  decaying  trace  of  x. 

extending  over  a  number  of  time  steps. 

It  was  at  this  point  that  neuronal  t..odeling  intersected  strongly 

with  the  theoretical  ana  experimental  results  of  animal  learning 
researchers  such  as  Kamin  (1968)  and  Rescorla  and  Wagner  (1972).  Sutton 
ana  Barto  (1981)  demonstrated  that  the  model  they  proposed  could  be  seen 
as  a  temporally  refined  extension  of  the  Rescorla-Wagner  (1972)  model. 
Like  the  Rescorla-Wagner  model,  the  Sutton-Barto  model  accounted  for  a 
variety  of  classical  conditioning  phenomena  including  blocking, 
overshadowing,  and  conditioned  inhibition.  here  vras  what  could  be 
interpreted  as  a  neuronal  model  (although  Sutton  and  Barto  did  not  insist 
on  that  interpretation)  making  predictions  similar  to  those  of  a  whole 

animal  model!  The  Sutton-Barto  model  represented  a  milestone  in  terhis  ot 


the  contact  prospective  neuronal  models  vjere  making  with  the  experimental 
evidence  of  animal  learning  (Sutton  and  Barto,  1981;  Barto  and  Sutton, 
1982;  Moore,  Desmond,  Berthier,  Blazis,  Sutton,  and  Barto,  1986;  Blazis 
and  Moore,  1987). 

However,  the  Sutton-Barto  model  still  deviated  from  the  experir.iental 
evidence  in  a  number  of  significant  respects.  One  problem  was  that  the 
sensitivity  of  tlie  niodel  to  conditioned  stimulus  durations  causco  the 
model  to  yield  inaccurate  predictions  for  a  variety  of  conditioned 
stimulus-unconditiofied  stimulus  configurations  for  which  the  concitioned 
stimulus  and  unconditioned  stimulus  overlapped  significantly.  The  model 
also  does  not  account  for  the  initial  positive  acceleration  in  the 
s-shaped  acquisition  curves  observed  in  classical  conditioning. 

One  approach  to  correcting  the  problems  of  the  Sutton-Barto  model 
has  been  to  utilize  a  variant  of  the  adaptive  heuristic  critic  algorithiii 
Developed  by  Sutton  (1984),  and  this  has  led  to  the  temporal  difference 
model  proposed  by  Sutton  and  Barto  (1987).  Temporal  difference  models, 
as  defined  by  Sutton  and  Barto  (1987),  utilize  differences  between 
temporally  successive  predictions  as  a  basis  for  learning.  Sutton  (1967) 
notes  that  the  earliest  and  most  well  known  use  of  a  temporal  difference 
(TD)  method  or  model  was  that  due  to  Samuel  (1959)  in  his  checker-playing 
program.  Other  examples  of  TO  methods  or  models  include  those  due  to 
Witten  (1977),  Sutton  and  Barto  (1981),  Booker  (1982),  Hampson 
(1983/1984),  Sutton  (1984),  Gelperin,  Hopfield,  and  Tank  (1985),  and 
Holland  (1986).  The  drive-reinforcement  neuronal  model  proposed  in  this 
report  is  an  example  of  a  temporal  difference  model. 


Variants  of  the  adaptive  heuristic  critic  model  (Bartu,  Sutton,  and 
Anderson,  1983;  Sutton,  1984)  represent  one  approach  to  solving  the 
problems  of  the  Sutton-Barto  model.  Seeking  to  address  these  sarne 
problems,  I  have  adoptee  an  alternative  approach  that  has  led  to  the 
neuronal  learning  mecharnsm  specified  by  equation  (Z).  For  this  model, 
the  hypothesized  sequence  of  critical  events  leading  to  learning  is  as 
follows: 

Ax,(t-j)  ->/\y(t)  -»Aw(t)  (S-5) 

where  j  replaces  k  and  all  of  the  critical  events  involve  derivatives 
with  respect  to  time.  The  variable,  k,  was  the  time  required  for  the 
neuron  to  receive  feedback  regarding  its  earlier  output,  y(t-k);  k 
rellected  an  instrumental  conditioning  orientation.  The  variable,  j,  is 
simply  an  interstimulus  interval  reflecting  a  classical  conditioriing 
orientation.  Barto  and  Sutton  had  also  considered  using  ,‘.x,(t)  instead 
of  x^(t)  in  their  learning  mechanism  but  decided  it  was  unworkable.  1 
returned  to  this  possibility  of  a  differential  learning  mechanism,  one 
that  correlates  earlier  derivatives  of  inputs  with  later  derivatives  of 
outputs,  and  found  a  way  to  make  it  workable  such  that  the  problem  with 
conditioned  stimulus  duration  effects  was  eliminated.  The  class  of 
differential  learning  niechani sms  was  independently  discovered  by  Klopf 
(1986),  coming  from  the  directions  of  neuronal  modeling  and  animal 
learning,  and  by  Kosko  (1986),  coniing  from  philosophical  and  mathematical 
directions. 

Sequence  {S-b)  implies  the  following  kind  of  learning  mechanism; 


However,  I  have  founa  tl'.at  the  most  workable  torr.i  of  the  learning 

mechanism  involves  adding  multiple  terms  and  nultiple  learning  rate 

constants  to  the  right  side  of  equation  (9),  the  terms  and  constants 

corresponding  to  a  range  of  interstimulus  intervals,  j.  Also,  making 

Aw^(t)  proportional  to  the  absolute  value  of  w^(t-j)  allows  the  model  to 

account  for  the  initial  positive  acceleration  in  the  acquisition  curves 

of  classical  conditioning.  These  refinements  led  to  the  neuronal 

learning  mechanism  specified  by  equation  (2)  and  repeated  here: 

Aw.(t)=  Ay(t). z  c.  |w.(t-j)|  AX.(t-j)  (10) 

1  1=1  j  1  1 

where  ax  (t-j)  must  be  greater  than  or  equal  to  7ero;  otherwise, 
i 

AX  (t-j)  is  set  equal  to  zero  for  the  purposes  of  equation  (10).  The 
i 

resulting  model  predicts  the  basic  categories  of  classical  conditioning 
phenomena,  as  will  be  demonstrated  in  the  next  section. 


SECTION  3 


CLASSICAL  CONDltlONING:  PREDICTIONS  OF 


THE  NEURONAL  MODEL 


Classical  conditioning  phenomena  are  basic  to  learning.  I  will  show 


in  this  section  that  the  or i ve-reinforcement  neuronal  model  predicts  a 


wide  range  of  classical  conditioning  phenomena.  This  will  be 


demonstrated  by  means  of  computer  simulations  of  the  model. 


The  neuronal  model  that  was  simulated  is  shown  in  Figure  3.  The 


input-output  (I/O)  relationship  assumed  for  the  neuron  was  that  of 


equatiori  (1).  The  neuronal  learning  mechanism  (L.M.)  was  that  of 


equation  (2)  with  the  refinement  noted  earlier:  whenever 


vyas  less  than  zero,  Ax^(i-j)  was  set  equal  to  zero  tor  the 


purpose  of  calculating  Aw.(t).  In  the  computer  simulations,  a 


conditioned  stimulus  (CS)  or  unconditioneo  stimulus  (US)  that  was 


presented  to  the  neuron  had  an  amplitude  that  ratiged  between  zero  and  one 


and  a  duration  that  was  specified  in  terms  of  the  times  of  stimulus  onset 


and  offset.  In  the  figures  showing  results  of  the  computer  simulations. 


each  CS-US  configuration  is  graphed  so  the  reaoer  may  see  the  relative 


aniplHudes  and  durations  of  stimuli  at  a  glance.  (For  exact  values  for 


any  of  the  parameters  for  the  computer  simulations,  the  Appendix  should 


be  consul  ted. ) 


Each  stimulus  was  presented  to  the  simulated  neuron  through  both  an 


excitatory  and  an  inhibitory  sytiupse  so  that  the  neuronal  learning 


mechanism  had,  tor  each  input,  both  an  excitatory  and  an  itihibitory 


weight  available  for  modification.  The  learning  mechanism  could  then 


> 


Figure  3.  The  drive-reinforcement  neuronal  model  employed  in  the 
computer  simulations.  This  is  a  specific  example  of  the  more  general 
model  shown  in  Figure  1.  The  description  that  was  given  in  Figure  1 
applies  here.  In  addition,  each  CS  and  US  is  represented  by  an 
excitatory  (+)  and  an  inhibitory  (-)  synapse.  The  efficacies  of  synapses 
[i.e.,  the  synaptic  weights,  w.(t)J  are  variable  (plastic)  for  synapses 
mediating  CSs  and  fixed  (nonplastic)  for  synapses  mediating  USs. 


i'  /  A/L* 


choose  to  moaify  one  or  the  other  weight  or  both  in  each  time  step.  In 
the  case  of  an  actual  (biological)  neuron,  if  a  CS  is  not  represented  by 
both  excitatory  and  inhibitory  synapses,  the  individual  neuron  will  be 
constrained  in  terms  of  what  classical  conditioning  phenomena  it  can 
nianifest.  It  will  be  seen  in  the  simulations  below  that,  for  a 
drive-reinforcement  neuron,  some  classical  conditioning  phenomena  require 
only  excitatory  plastic  synapses  and  some  require  only  inhibitory  plastic 
synapses.  Those  classical  conditioning  phenomena  requiring  both 
excitatory  and  inhibitory  plastic  synapses  would  have  to  emerge  at  a 
higher  level  if  the  individual  neurons  involved  had  their  CSs  represented 
by  only  excitatory  or  only  inhibitory  plastic  synapses. 

In  the  discussion  that  follows,  a  conditioned  or  unconditioned 
stimulus  and  the  associated  x^(t)  in  Figure  3  are  identical.  For 
example,  x^(t)  arid  x^(t)  are  one  and  the  scfiie  as  CS^.  The  weights 
associated  with  the  synapses  carrying  the  unconditioned  stimulus  were 
fixed  (nonplastic)  and  the  remaining  synaptic  weights  were  variable 
(plastic) . 

The  conditioned  stimulus  or  unconditioned  stimulus  thdt  is  described 
should,  perhaps,  more  properly  be  referred  to  as  a  neuronal  conditioned 
stimulus  or  a  neuronal  uncondi tioried  stimulus  because  it  is  the  stimulus 
that  IS  reaching  the  neuron,  not  the  stimulus  that  is  reaching  the  whole 
animal.  However,  for  the  sake  of  simplicity  in  the  discussion,  I  will 
refer  to  these  neuronal  input  sigtials  as  conditioned  and  unconditioned 
stimuli  or,  simply,  CSs  and  USs.  Likewise,  the  output,  y(t),  of  the 
neuron  would  more  properly  be  referreo  to  as  the  neuronal  conditioned  or 
unconditioned  response  but  I  will  usually  refer  to  the  r.eurondl  response 


as  the  conditioned  response  (CK)  or  unconditioned  response  (UR).  Built 
into  these  terminolotjical  conventions  is  the  assumption  that  stimuli  and 
responses  external  to  an  onimal's  nervous  systen;  cio  not  differ 
fundamental 1>  in  form  from  the  way  stinioli  and  responses  are  represented 
internal  to  the  animal's  nervous  system.  This  assun'ption  might  not  hold 
up  well  at  higher,  cognitive  levels  of  function  but  the  assumption 
appears  reasonable  as  a  starting  point  for  testing  the  ability  of  a 
neuronal  model  to  predict  fundamerital  learning  phenomena. 

dust  as  the  range  of  x^(t)  in  the  simulations  was  from  zero  to  one, 
as  was  noted  when  the  range  of  CS  anc  US  amplitudes  was  discussed,  so  the 
range  of  y(t),  the  neuronal  output  was  from  zero  to  one.  Such  a  range 
serves  to  model  a  finite  range  of  frecjutncies  for  neuronal  inputs  and 
outputs.  Actual  frequencies  of  biological  neurons  range  up  to  several 
hundred  spikes  per  second  in  the  case  of  neocortical  neurons  firir.g  tor 
brief  intervals  (Lynch,  MountCoStle,  Talbot,  and  Yin,  1977).  Therefore, 
one  could  multiply  the  neuronal  input  ana  output  amplitudes  used  in  the 
simulations  by,  say,  three  hunared  if  one  desires  to  see  more  realistic 
numbers.  However,  for  the  purposes  of  the  simulations  to  be  reported, 
the  relative  magnitudes  of  the  parot.iefers  are  important,  not  the  absolute 
magnitudes. 

The  number  of  synapses  impinging  on  the  sirnulatea  neuron  is  eight, 
as  is  indicated  iti  Figure  3.  This  correspor.ds  to  three  possible  Lbs  and 
ore  US.  The  absolute  values  of  the  plastic  synaptic  weights  mediating 
the  CSs  have  a  luv.er  bound  of  0.1  and,  when  the  simulations  began,  these 
excitetory  and  inhibitory  weigi.ts  were  set  at  plus  and  minus  0.1, 
respectively.  (For  exceptions  tc  tliis  statement,  see  the  Appendix;  in 


some  simulations,  inhibitory  synaptic  weights  were  set  equal  to  zero 

because  they  did  not  play  a  significant  role  and  it  simplified  the 

graphs.)  The  neuronal  threshold  was  set  at  zero  because,  dt  higher 

values  of  the  neuronal  threshold,  the  form  of  the  model's  predictions  did 

not  change.  The  only  effect  of  higher  thresholds  was  that  more  trials 

were  required  for  the  synaptic  weights  to  reach  their  asymptotic  values. 

For  the  learning  mechanism,  the  learning  rate  constarits,  c  through  c  , 

1  b 

were  set  at  values  such  that  c  >  c  .  As  noted  earlier,  this  is 

j  J+1 

reasonable  if  one  views  each  time  step  as  being  equivalent  to  one-half 


second  because  then  c 

1 


is  maximal. 


corresponding  to  a  nominal  optimal 


interstiniulus  interval  of  one-half  second.  Successive  c-values  then 


decrease  as  the  interstiniulus  interval  increases.  As  also  noted  earlier, 


Cq  and  Cg  were  set  equal  to  zero,  corresponding  to  interstimulus 
intervals  of  zero  and  three  seconds,  respectively.  Thus,  in  the 
simulations,  j  ranged  from  one  to  five;  i.e.,  t  was  set  equal  to  five. 
What  follows  are  the  results  of  computer  simulations  of  the 


drive-reinforcement  neuronal  model  for  a  variety  of  CS-US  configurations. 
The  predictions  of  the  model  are  examined  for  delay  and  trace 
conditioning,  CS  and  US  duration  and  amplitude  effects,  partial 
reinforcement  effects,  interstimulus  interval  effects  including 


simultaneous  conditioning,  second-order  conditioning,  conditioned 
inhibition,  extinction,  reacquisition  effects,  backward  conditioning, 
blocking,  overshadowing,  compound  conditioning,  and  di  scnmi  native 
stimulus  effects. 

During  a  simulation,  the  CS-US  configuration  was  presented  once  in 
each  trial.  The  values  of  the  synaptic  weights  at  the  end  of  each  trial 


were  recorded  and  plotted  as  a  function  of  the  trial  number.  These 
graphs  of  synaptic  weights  versus  trials  are  shown  in  the  figures 
accompanying  the  discussion  below.  In  aodition,  in  each  figure,  the 
CS-US  configuration  is  graphed  along  with  the  response  of  the  neuron 
during  the  last  trial.  Ihe  neuronal  response  is  labeled  "Y,"  designating 
a  plot  of  y(t)  for  the  last  trial  of  the  simulation.  The  definition  of  a 
trial  should  be  noted.  The  CS-US  configuration,  or  what  is  referred  to 
in  the  figures  as  the  "stiriiulus  configuration",  defines  a  trial.  Thus, 
the  graphed  stimulus  configurations  in  the  figures  are  intended  to  show 
not  only  relative  times  of  onset  and  offset  along  with  amplitudes  of 
stimuli  but  also  the  number  of  times  a  stimulus  was  presented  during  a 
trial.  What  will  be  seen  in  the  figures  is  that  the  behavior  of  the 
synaptic  weights,  as  predicted  by  the  drive-reinforcement  neuronal  model, 
mirrors  the  observed  behavior  of  animals  as  they  are  learning  curing 
classical  conditioning  experiments. 

Before  discussing  the  individual  simulations,  two  remarks  are  in 
order  regarding  the  graphs  of  synaptic  weights  versus  trials.  Any 
synaptic  weight  that  played  a  significant  role  for  the  conditioning 
phenomenon  being  discussed  is  shown  in  the  accompanying  graph.  Any 
synaptic  weight  that  played  no  significant  role  (typically  meaning  that 
the  neuronal  leatt.ing  mechanism  did  not  alter  the  weight  at  all  during 
the  simulation)  is  not  shown  in  order  to  simplify  the  graphs.  Also,  data 
points  for  the  synaptic  weight  values  at  the  end  of  each  trial  are  not 
shown  on  the  graphs  because  the  resulting  oensity  of  the  data  points 
would  be  excessive  and  because  the  data  points  fall  exactly  on  the 
(theoretical)  curves  that  have  been  drawti. 


with  foud  (the  US).  The  observed  ►‘esult  in  such  experiments  is  that 

conditioned  excitation  develops.  The  bell  becomes  excitatory  with 
respect  to  the  salivary  yland.  In  addition,  it  is  observed  that  the 

amount  of  salivation  in  response  to  the  bell  alone  (measured  with 

occasional  test  probes)  increases  with  increasing  trials  such  that  an 

s-shaped  or  sigmoid  curve  results  when  the  amount  of  salivation  is 
plotted  versus  the  trial  riuniber.  That  is  to  say,  the  amount  of 

salivation  in  response  to  the  bell  alone,  as  a  function  of  trials, 

positively  accelerates  initially  and  then  negatively  accelerates  as  an 
asymptotic  level  of  conditioning  is  approached  (Pavlov,  1927).  Spence 
(1956)  has  observed  that  the  acquisition  curves  of  classical  conditioning 
are  always  s-shaped,  providing  that  the  experiments  are  done  carefully 
enough  to  capture  the  initial  positive  acceleration  and  the  later 

negative  acceleration.  For  example,  Spence  (1956,  pp.  68-70)  states  that 
acquisition  curves  that  "do  not  exhibit  an  initial,  positively 

accelerated  phase  do  not  do  so  either  because  they  do  not  start  at  zero 
level  of  conditioning  or  because  the  conditioning  is  so  rapid  that  the 
period  of  initial  acceleration  is  too  brief  to  be  revealed  except  by  very 
small  groups  or  blocks  of  trials." 

Figure  4  shows  the  predicted  acquisition  curves  of  three  neuronal 
models  for  delay  conditioning.  In  Figure  4(a),  the  results  of  a 
simulation  of  the  model  proposed  by  Hebb  (1949)  are  shown.  For  the 


Hebbian  model,  the  input-output  relationship  is  the  same  as  fur  the 
drive-reinforcement  model  and  is,  therefore,  specified  by  equation  (1). 
The  Hebbian  learning  mechanism  has  already  been  noted  and  is  specified  by 
equation  (3).  It  can  be  seen  in  Figure  4(a)  that  if  a  Hebbian  neuron 
were  driving  the  salivary  gland,  the  amount  of  saliva  produced  in 
response  to  the  bell  alone  as  a  function  of  trials  would  exhibit  an 
essentially  linear  relationship  because  the  excitatory  syriaptic  weight 
associated  with  the  CS  varies  in  an  essentially  linear  fashion  with  the 
trial  number.  Also,  it  may  be  notea  that  the  Hebbian  learning  mechanism 
does  not  yield  an  asyniptotic  synaptic  weight  value  but,  ratlier,  continues 
to  increase  the  synaptic  weight  indefinitely  or,  of  course,  until  an 
upper  bound  would  be  reached. 

In  Figure  4(b),  the  results  of  a  simulation  of  the  Sutton-Barto 
(19fal)  model  are  shown.  The  Sutton-Barto  learning  mechanism  was 
specified  earlier  in  equations  (7)  and  (8).  The  model's  input-output 
relationship  is  that  of  equation  (1).  The  model  is  seen  to  predict  a 
negatively  accelerated  acquisition  curve  in  that  the  excitatory  synaptic 
weight  associated  with  the  CS  negatively  accelerates  with  increasing 
trials.  It  may  be  noted  that  the  Rescorla-V.'agner  (1972)  model  also 
predicts  a  negatively  accelerated  acquisition  curve,  as  have  earlier 
whole  animal  models  [see,  for  example,  a  model  due  to  Estes  (1950)]. 

In  Figure  4(c),  the  results  of  a  simulation  of  the 
drive-reinforcement  model  are  shown.  The  model  is  seen  to  predict 
an  s-shaped  acquisition  curve:  Coriditiuned  excitation  aevelops, 

first  through  a  positively  accelerating  phase  and  then  through  a 
negatively  accelerating  phase.  The  drive-reinforcement  model  is 


thus  seen  to  be  consistent  with  this  dspect  of  the  experiiwentdl  evidence 
of  delay  conditioning. 

Some  reasons  why  tlie  drive-reinforcement  model  yields  an  s-shopeo 
acquisition  curve  may  be  noted.  The  initial  positive  acceleration  is  due 
to  the  efficacy  of  tiie  relevarit  synapse  appearing  oS  a  factor  on  the 
right  side  of  equation  (2).  Thus,  as  the  learning  P'echanisni  increases 
the  efficacy  of  the  synapse,  the  future  rate  of  change  of  the  efficacy  of 
the  synapse  is  also  caused  to  increase.  With  continued  conditioning, 
another  process  comes  to  dominate,  yielding  the  eventual  negative 
acceleration  in  the  acquisition  curve.  The  negative  acceleratioti  is  due 
to  Ay(t)  decreasing  with  continued  conditioning.  In  effect,  Ay(t) 
moves  to  an  earlier  point  in  time  with  conditioning,  becoining  Ay(t-j) 
where  j  is  the  interstimulus  interval.  Thus,  throughout  tlie  conditioning 
process,  increasing  values  of  w^(t-o)  are  competing  with  decreasing 
values  of  Ay(t)  in  equation  (2).  Ra’pidly  increasing  values  of  w^.U-j) 
prevail  initially  and  rapidly  decreasing  values  of  Ay(t)  prevail  later, 
yielding  the  respective  positive  and  negative  accelerations  in  the 
acquisition  curve, 

CS  and  US  duratiori  effects 

A  careful  reader  may  note  that,  in  Figure  4,  the  same  CS-bS 
configuration  is  not  used  for  the  simulation  of  eacti  of  the  models.  Tne 
hebbian  model's  CS  offset  coincides  with  the  offset  of  the  US  whereas  the 
Sutton-barto  and  dri ve-reintorcenient  model's  CSs  have  tho  offset  occuring 
at  the  time  of  US  onset.  I  chose  those  particular  CS-US  configurations 
because,  otherwise,  the  Hebbiai.  and  Sutton-Barto  models  would  not  have 
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predicted  the  development  of  coriditioned  excitation.  both  of  these 

models  are  sensitive  to  CS  durations  in  a  way  that  is  not  consistent  with 
the  experimental  evidence,  the  models  predicting  no  conditioning  or 
conditioned  inhibition  for  some  CS-US  configurations  that, 
experimentally,  are  known  to  yield  coriditioned  excitation.  The 
effect  of  CS  duration  is  examineo  systematically  in  Figure  5  where 

each  model's  predictions  art  shown  for  the  same  set  of  three  CS-US 
configurations.  1  will  specify  how  the  three  CS-US  configurations 
differ  and  then  aiscuss  each  model's  predictions  for  each  of  the 
three  conf igurotions . 

In  Figure  5,  CS^  offset  occurs  at  the  time  of  US  onset,  CS^  offset 
occurs  at  the  time  of  US  offset,  and  CS^  offset  occurs  one  time  step 
after  US  offset.  Experimentally,  it  is  known  that  conditioned  excitation 
(corresponding  in  the  neuronal  models  to  the  growth  of  positive  synaptic 
weights)  is  observed  iri  all  three  cases.  In  general,  the  efficacy  of 
delay  conditioning  is  a  strong  functiori  of  the  time  of  CS  onset  and 
relatively  independent  of  CS  duration  (Kamin,  1965). 

In  Figure  5(a),  it  is  seen  that  the  Hebbian  model  predicts 

conditioneo  excitation  for  CS^  and  CS^  but  not  for  CS^.  In  Figure  5(b), 

it  is  seen  that  the  Sutton-Barto  model  predicts  conditioned  excitation 
for  CS  and  strong  conditioneo  inhibition  for  CS  and  CS  .  In  Figure 
5(c),  it  is  seen  that,  consistent  with  the  experimental  evidence,  the 
drive-reinforcement  model  predicts  conditioned  excitation  for  all  three 
CSs  and,  in  each  case,  predicts  an  s-shaped  acquisition  curve. 

In  Figure  5(c),  more  detailed  aspects  of  the  dri ve-reitiforcement 
model's  predictions  may  be  noted.  For  example,  the  model  predicts  a 
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Figure  5.  Results  of  simulated  delay  conditioning  experiments  with  (a) 
Hebbian,  (b)  Sutton-Barto ,  and  (c)  drive-reinforcement  neuronal  model:-,. 
The  effect  of  CS  duration  is  examined.  (See  text  and  Appendix  for 


particular  rarikiny  of  CSs  in  terms  of  initial  rate  of  condi  tiornng  and 
asymptotic  synaptic  weight  value  as  a  function  of  CS  duration.  The 
experimental  literature  does  not,  at  this  point,  permit  the  accuracy  of 
these  more  detailed  predictions  to  be  assessed.  Furthermore,  whole 
animal  data  i.iay  be  insufficient  to  test  these  predictions,  in  that  higher 
level  atterition  mechanisms  may  play  a  significant  role  when  CS 
durations  are  extended  beyonc  the  US  (Ayres,  Albert,  and  Bombace,  1987). 
Experiments  at  the  level  of  the  single  neuron  may  be  required  to  test 
these  predictions. 

Regarding  the  effects  of  US  duration,  the  drive-reinforcement  model 
predicts  increasing  rates  of  conditioning  as  the  US  duration  increases 
(see  Figure  6)  and  this  is  consistent  with  the  experimental  evidence 
(Ashton,  Bitgood,  and  hoore,  1969;  Gorniezano,  Kehoe,  and  Marshall,  1983). 

Thus  far,  the  dri ve^-^reinforcement  neuronal  niodel's  predictions  have 
been  demonstrated  to  be  accurate  for  three  categories  of  classical 
conditioning  phenomena;  (a)  the  form  of  the  acquisition  curve  in  delay 
conditioning,  (b)  relative  insensitivity  to  CS  duration,  and  (c)  US 
duration  effects.  The  predictions  of  the  model  for  a  variety  of  other 
CS-US  configurations  will  now  be  examined,  these  CS-US  configurations 
corresponding  to  what  appear  to  be  the  remaining  basic  categories  of 
classical  conditioning  phenomena.  While  the  predictions  of  the  Hebbian 
and  Sutton-barto  models  for  these  CS-US  configurations  will  not  be  shown, 
it  should  be  notea  tfiat  the  Hebbian  model's  predictions  frequently 
deviate  substantially  froni  experimentally  observed  behavior,  examples  of 
this  having  already  been  seen  in  Figures  4  and  b.  (Of  course,  it  remains 
a  theoretical  possibility  that  biological  neurons  are  Hebbian  and  that 
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dassicdl  cunclitiODing  phenomend  ore  emergent,  resulting  fruiii  the 
interactiuriS  of  perhaps  large  nunibers  of  Hebbian  neurons.  Experimental 
tests  to  be  aiscussed  later  will  be  required  to  resolve  this  question.) 
The  predictions  of  the  Suttoti-Barto  model  are  similar  to  tiiose  of  the 
drive-reinforcement  model,  if  one  is  careful,  in  the  case  of  the 
Sutton-Barto  model,  not  to  use  substantial ly  overlapping  CSs  artd  USs  and 
accepting  that  the  Sutton-Barto  moders  predicted  acquisition  curves  are 
not  s-shaped. 

CS  and  US  amplitude  effects 

It  is  known  that  faster  conditioning  results  as  the  intetisity  of  the 
CS  increases  (Pavlov,  19i^7 ;  see  review  by  Moore  and  Gurmezano,  1977).  As 
is  seen  in  Figure  7,  the  dn  ve-rcinforceinent  model  predicts  this 
relationship.  Shown  in  Figure  7  are  CSs  of  three  different  amplitudes, 
each  being  reinforced  by  a  US  of  the  same  aniplituae.  The  predicted  rate 
of  conditioning  is  seen  to  increase  as  the  aniplituae  or  intensity  of  the 
CS  iricreases.  Fur  the  three  CSs,  the  rank  ordering  of  the  asymptulic 
Values  of  the  synaptic  weigtits  is  the  reverse  or  the  rank  orderirig  of  the 
rates  of  acquisition  because  a  low  amplitude  CS  requires  a  larger 
asymptotic  synaptic  weight  to  yield  the  same  eventual  CR  amplitude  as  can 
be  obtained  with  a  high  aniplitude  CS  and  a  lower  asyi.iptotic  synaptic 
weight. 
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Figure  7.  The  drive-reinforcement  model's  predictiorib  of  the  effects  of 
CS  amplitude.  Consistent  with  the  experimental  evioence,  as  the  CS 
amplitude  decreases,  the  rate  of  growth  of  the  excitatory  synaptic 
weights  associated  with  the  reinforced  CSs  decreases.  Asymptotic 
excitatory  synaptic  weight  values  vary  inversely  with  CS  amplitude 
because  a  lower  CS  uniplitude  requires  a  higher  excitatory  asyniptutic 
synaptic  weight  value  to  yield  a  CR  amplitude  equal  to  the  UR  amplitude. 
(See  text  and  Appendix  for  details.) 
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Regarding  US  amplitude  effects*  hoore  and  Gorinezaric  (19/7,  p,  115) 
note  that  "Within  limits,  the  rate  of  acquisition  end  level  of 
performance  of  a  CR  ere  increasing  functions  oi  the  intensity  of  ttie  US." 
This  is  predicted  by  the  ori ve-rei nforcement  model  as  can  be  seen  in 
Figure  8  where  three  ideritical  CSs  are  shown  being  reinforced  by  USs  of 
decreasing  amplituot.  It  is  seen  thot  both  the  rate  of  acQuisicion  arid 
the  asymptotic  weight  value  aecrease  as  the  US  amplitude  decreases. 

CS  preexposure  effects 

CS  preexposure  refers  to  nunreintorceo  presentations  oi  a  CS  prior 
to  reinforcea  presentations.  The  observed  result  is  that  CS  preexposure 
retards  subsequent  acquisituif;  of  the  conaitioned  response  when 
reinforced  presentations  of  the  US  begin  but  the  experimtt.ta 1  evidence 
also  suggests  that  the  preexposed  CS  does  nut  become  inhibitory  [see 
review  by  Flaherty  (1985)  who  cites,  e.g.,  Rescorla  (1971),  Reiss  and 
Wagner  (1971;)  oriC  Solomon,  Brennan  and  l-.oore  (  1974  )].  As  Flaherty  (1985) 
notes,  otie  possible  explanatioti  for  CS  preexposure  effects  is  that  the 
animal  may,  during  the  nonreinforced  CS  presentations,  learn  not  to 
attend  to  the  stimulus.  If  tfiis  is  the  case,  CS  pretxpusure  effects 
would  not  be  predicteo  by  a  rieuronol  model.  Rather,  such  effects  woulo 
require  network-level  considerations  for  ttreir  prediction.  Tne  relateo 
subject  of  US  preexposure  effects  will  be  ciscussed  later  when  the 
phenomenon  of  blocking  is  considereo. 

Partial  reinforcement  effects 

In  the  case  of  partial  rei nforceinerit  ,  a  CS  is  net  always  followed  by 
a  US.  Tfiis  can  be  contrasteo  witti  continuous  rei ntur  cement ,  in  which 
case  the  US  always  fellows  the  CS.  The  observed  result  t'f  partial 
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Figure  8.  The  drive-reinforcement  model's  predictions  of  the  effects  of 
US  amplitude.  Consistent  with  the  experimental  evidence,  as  the  US 
amplitude  decreases,  the  rates  of  growth  and  asymptotic  values  of 
excitatory  synaptic  weights  associated  with  the  reinforced  CSs  decrease. 
(See  text  and  Apperidix  for  details.) 
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rei iiforcenifctit,  is  a  reduced  rate  of  cunditiofiing  and  sonieiunes  a  reduceo 
asymptotic  level  of  responding  (Gormezano,  Kehoe,  and  Marshall,  1983) 
relative  to  the  rates  and  asymptotic  levels  observed  for  continuous 
reinforceriierit.  The  drive-reinforcement  model's  predictions  are 
consistent  with  this,  as  can  be  seen  in  Figure  9,  where  CS^  is  reinforced 
100  percent  of  the  time,  CS^  is  reinforced  50  percent  of  the  time,  and 
CS^  IS  reinforced  25  percent  of  the  time.  In  Figure  9,  it  is  seen  that 
rates  of  acquisition  ano  asymptotic  weight  values  are  predicted  to 
decrease  as  the  perceiit  reinforcement  decreases. 

Trace  conditioning 

Trace  conditioning  is  an  experimental  procedure  in  which  CS  offset 
precedes  05  unset.  The  time  between  CS  offset  and  US  onset  is  termed  the 
trace  interval .  In  general,  the  longer  the  trace  interval,  the  lower  the 
rate  of  acquisition  and  the  lower  the  asymptotic  level  of  conditioning 
[See  Flaherty  (1985)  for  a  review  of  the  experimental  evidence].  The 
drive-reinforcement  model  predicts  these  relationships,  as  can  be  seen  in 
Figure  10,  where  three  CS-US  configurations  are  shown.  It  can  be  seen 
that  increasing  trace  intervals  yielded  both  lower  rates  of  acquisition 
ana  lower  asymptotic  synaptic  weight  levels.  In  terms  of  the 
drive-reinforcement  model's  dynamics,  some  reasons  that  trace 
conditioning  is  less  effective  than  delay  conditioning  are  that  the  ax 
that  occurs  at  CS  onset  is  paired  not  only  with  the  positive  Ay  of  US 
onset  but  also  with  the  negative  Ay  of  CS  offset  and,  furthermore,  the 
interstimulus  interval  for  the  negative  Ay  has  a  larger  learning  rate 
constant  associated  with  it  than  does  the  interstimulus  interval  for  the 
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Figure  9.  The  drive-reinforcement  model's  predictions  of  the  effects  of 
partial  reinforcement.  Consistent  with  the  experimental  evidence,  it  is 
seen  that  as  the  fraction  of  CSs  that  are  reinforced  decreases,  so  does 
the  rate  of  growth  of  excitatory  synaptic  weights  associated  with  the 
reinforced  CSs.  The  drive-reinforcement  model  also  predicts  lower 
asymptotic  excitatory  synaptic  weight  values  as  the  percentage  of 
reinforced  CSs  decreases,  an  effect  that  is  consistent  with  some  partial 
reinforcement  studies.  (See  text  and  Appendix  for  details.) 
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Figure  10.  The  drive-reinforcement  model's  predictions  of  the  effects  of 
trace  conditioning.  Consistent  with  the  experimental  evidence,  as  the 
trace  interval  increases,  the  rates  of  growth  and  asymptotic  values  of 
the  excitatory  synaptic  weights  associated  with  the  reinforced  CSs 
decrease.  (See  text  and  Appendix  for  details.) 


Interstlmulus  interval  ef fects  Including  sinml taneous  ccnditionlng 

The  predictions  of  the  dn ve-reinforcement  model  for  a  variety  of 
interstiniulus  intervals  in  delay  conditioning  are  shown  in  Figure  11. 
The  iriterstimulus  interval  is  defined  to  be  the  time  between  CS  and  US 
onsets.  In  the  case  of  CS^  in  Figure  11,  CS  and  US  onsets  are 
simultaneous.  This  CS-US  configuration  is  an  example  of  what  is  referred 
to  as  simultaneous  conditioning.  Citing  Pavlov  (1927)  aS  well  as  Smith, 
Coleman,  and  Gormezano  (1969),  Flaherty  (1985)  notes  that  "little  or  no 
conditioning  occurs  with  simultaneous  CS  and  US  onset."  This  is  what  the 
dri ve-reinforcement  model  predicts.  As  can  be  seen  in  Figure  11,  the 
synaptic  weight  for  CS^  remains  unchanged  during  the  sixty  trials  for 
which  the  computer  simulation  was  run.  Flaherty  (1985)  goes  on  to  note 
that  some  conditioning  has  been  reported  for  simultaneous  CS  and  US 
onsets  in  tlie  •  case  of  fear  conditioning  (Burkhardt  and  Ayres,  1978; 
Mahottey  and  Ayres,  1976).  Thus,  the  experimental  results  with  regard  to 
simultaneous  conditioning  appear  complex  and  it  can  only  be  noted  that 
the  predictions  of  the  drive-reinforcement  model  appear  to  be  consistent 
with  some  of  the  experiiiiental  evidence. 

For  interstimulus  intervals  greater  than  zero,  experimental  results 
suggest  that  a  non.inal  interval  of  50U  ms  (one  time  step  in  the 
simulations)  is  optimal  when  conditioning  short  latency  skeletal 
reactions.  With  longer  intervals,  the  efficacy  of  conditioning  declines 
until,  tor  intervals  exceediiig  a  few  seconds,  no  conditioning  is  observed 
(see  review  by  Moore  and  Gormezano,  1977).  This  is  consistent  with  the 
predictions  of  the  drive-reinforcement  model.  In  Figure  11,  it  is  seen 
that  conditioning  is  niost  rapid  for  an  interstimulus  interval  of  one  time 
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Figure  11.  The  drive-reinforcement  model's  predictions  of  the  effect  of 
the  interstimulus  interval.  Consistent  with  the  experimental  evidence 
and  consistent  with  the  assignment  of  values  to  the  learning  rate 
constants,  c.,  the  model  is  seen  to  predict  no  conditioning  for 
simultaneous  and  US  onsets  and  then  decreased  rates  of  conditioning 
as  the  interstimulus  interval  increases  beyond  the  optimal  interstimulus 
interval  employed  with  CS^.  Interstimulus  intervals  were  as  follows: 

L. 

zero  time  steps  for  CS^,  one  time  step  for  CS^,  three  time  steps  for  CS^, 

five  time  steps  for  CS  ,  ana  six  time  steps  for  CS^.  (See  text  and 
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Appendix  for  details. ) 


step  in  the  cdse  of  CS^,  progressively  slower  for  intervdls  of  three  and 

five  time  steps  in  the  cases  of  CS^  ario  CS^,  respectively,  with  no 

conditioning  manifesting  for  an  ifiterstimulus  interval  of  six  time  steps 

in  the  case  of  CS  . 

b 

An  alternative  way  of  viewing  the  simulation  results  shown  in  Figure 

11  is  to  see  them  as  confirming  the  expected  consequences  of  assigning 

the  learning  rate  constants,  c^ ,  in  the  manner  described  earlier. 

Namely,  c„  and  c,  were  set  equal  to  zero,  c,  was  assigned  the  highest 
Do  1 

value  diid  c  through  c  were  assignea  progressively  lower  values.  Thus, 
c  5 

the  simulation  results  in  Figure  11  reflect  the  fact  that  the  learning 
rate  constants  were  chosen  consistent  with  the  empirical  evidence 
regarding  iriterstimulus  interval  elfects. 


Second-order  conditioning 

Second-order  conditioning  is  an  experimental  procedure  in  which  one 
CS  is  reinforced  by  another  CS,  the  latter  CS  having  been  previously 
reinforced  by  a  US.  Pavlov  (1927)  reported  that  this  procedure  yielded 
conditioning  in  the  second  stage,  the  second  CS  coming  to  elicit  the 
conditioned  response  oriyitially  elicited  only  by  the  first  CS.  However, 
in  discussing  second-order  conditioning,  Rescorla  (1980,  pp.  3-4) 
consents  on  "a  historically  nagging  issue".  Rescorla  states  that  the 
"issue  concerns  whether,  in  fact,  second-order  conditioning  is  a  real  and 
powerful  phenomenon.  Although  Pavlov  reported  its  occurrence,  he 
described  it  as  transient.  Subsequent  authors  have  often  been 
less  than  enthusiastic  about  its  reality."  This  is  interestirig 


because  the  drive-reinforcement  model  predicts  that  second-order 


conditioning  will  not  be  as  strong  as  first-order  coridi tioning  and  that 

secono-order  conditioning  will  be  transietit.  Simulation  results  that  are 

the  basis  of  this  prediction  are  shown  in  Figure  IZ  where,  in  stage  one 

of  conditioning,  CS^  is  reinforced  by  a  US,  achieving  an  asymptotic 

synaptic  weight  value  of  just  a  little  more  than  tour.  After  delay 

conditioning  in  stage  one  (trials  1-60),  secono-order  conditioning  occurs 

in  stage  two  (trials  61-'(i00).  The  drive-reinforcement  model  predicts 

significantly  weaker  conditioning  in  stage  two,  the  synaptic  weight 

associated  with  CS  peaking  at  a  value  between  oiie  and  two.  Furtherniore, 
'l 

the  transient  nature  of  second-order  conditioning,  as  reported  by  Pavlov 
(1927),  is  predicted  by  the  model.  In  stage  two  of  the  simulateo 
secono-order  conditioning  experiment,  after  the  CS^  synaptic  weight 
peaks,  the  model  predicts  the  subsequent  oecline  of  the  weight  oue  to 
what  is  essentially  an  extinction  process.  Had  the  simulation  been 
carried  out  for  further  trials,  the  CS^  synaptic  weight  would  have 
declined  to  the  lower  bound  of  0.1. 

Conaitionea  inhibitioti 

Delay  conditioning  yields  conditioned  excitation;  i.e.,  the  CS  comes 
to  excite  the  conditioned  response  (CR).  An  alternative  procedure 
developed  by  Pavlov  (1927)  yields  what  he  termed  conditioned  inhibitiori; 
i.e.,  a  CS  would  come  to  inhibit  a  CR  that  otherwise  would  have 
manifested. 

One  of  Pavlov's  procedures  for  demonstrating  conditioned 
inhibitioti  was  as  follows.  In  the  first  stage  of  conditioning, 
Pavlov  woula  utilize  a  delay  conditioning  procedure  to  render  CS^ 
excitatory  with  respect  to  a  CR.  Then,  in  a  second  stage  of 


STIMULUS  CONFIGURATION  AND  RESPONSE: 


TRIALS  1-80; 


CSi 


CS2 


US 

Y 


TRIAL  61-200; 


CONOmONED SECOND-ORDER 

EXCITATION  CONDITIONING 


V) 


O 

iD 


o 


I 


> 

M 


TRIAL 


Figure  I?.  The  dnve-reinforcement  model’s  predictions  of  the  effects  of 
second-order  conditioning.  Consistent  with  the  expeririental  evidence, 
after  delay  conditioning  in  stage  1  (trials  1-60),  the  excitatory 
synaptic  weight  associated  with  extinguishes  in  stage  2  (trials 

61-200)  during  second-order  conditioning.  Also  consistent  with  the 
experimental  evidence,  the  excitatory  synaptic  weight  associated  with  CS^ 
increases  initially  during  stage  2  and  then  decreases.  (See  text  and 
Appendix  for  details.) 


also  present  an  urirei nforced  CS^-CS^  pair  to  the  arniadl.  During  the 
Second  stage  of  conditioning,  the  aniniars  response  to  CS^  unpaired  would 
decrease  initially  and  then  return  to  its  original  level.  The  animal's 
response  to  the  CS.-CS  pair  would  decrease  to  zero.  Furthermore,  Pavlov 

i'c 

was  able  to  demonstrate  that  CS,  became  a  condiiioneo  inhibitor  in  that, 

after  stage-two  conditioning,  if  CS^  was  paired  with  another  CS,  scty  CS^, 

that  was  known,  by  itself,  to  be  a  conditioned  exciter,  the  CR  associated 

with  CS^  Wos,  in  general,  reduced  or  eliminated. 

The  drive-reinforcement  model  predicts  this  behavior,  as  can  be  seen 

in  Figure  13.  In  stage  one  (trials  1-70)  of  the  simulated  conditioning, 

CS^  is  reinforced  by  a  US  such  that  conditioned  excitation  develops,  with 

the  progress  of  the  excitatory  weight,  w^(E),  exhibiting  tiie  usual 

s-shaped  acquisition  curve.  Then,  in  stage  two  (trials  71-200),  CS^ 

unpaired  is  reinforced  by  the  US  once  in  each  trial  while  the  CS  -CS 

1  2 

pair  is  also  presented  once  during  each  trial  and  the  pair  is 

unreinforced.  The  model  predicts  that  the  excitatory  weight  associated 

with  CS^  will  decrease  initially  and  then  return  to  its  previous  level, 

mirroring  the  behavior  Pavlov  observed  with  his  animals.  Also,  the  model 

predicts  that  the  inhibitory  weight,  W2(I),  associated  with  CS^,  will 

grow  stronger  as  stage  two  conditionitig  proceeds,  consistent  with 

Pavlov's  observation  that  CS  becomes  a  conditioned  inhibitor. 

2 

(Regarding  the  notation  employed  here,  an  "E"  or  an  "1"  in  parerrtheses 
following  "w^"  signifies  an  excitatory  or  intiibitory  weight, 
respectively.  This  notation  involves  a  degree  of  redundancy  in  that 
excitatory  weights  will  always  be  positive  and  inhibitory  weights  will 
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Figure  13.  Results  of  a  siniuldteo  classical  conditioinng 
moaeled  after  experiments  perfumed  by  Pavlov  (19^:/), 
conditioned  excitation,  conditioned  inhibition,  and  extinccu 
are  employed.  (See  text  ano  Appendix  for  details.) 


exfH'riment 
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paradigms 


always  be  negative  so,  in  the  graphs,  excitatory  arid  inhibitory  weights 
tor  a  particular  CS  coulo  be  distinguished  on  that  basis.) 

Because  the  decrease  in  the  excitatory  weight  associated  with  CS^ 

during  the  second  stage  of  conditioning  and  then  its  subsequent  return  to 

the  asymptotic  level  achieved  in  the  first  stage  of  conditioning  may  seem 

surprising,  a  few  words  of  explanation  may  be  in  order.  The  initial 

decrease  is  due  to  the  occurrence  of  the  unreinforced  CS  -CS,  pair  iri 

that  the  onset  of  the  CS^-CS^  pair  yields  a  positive  ax^  that  is  followed 

by  a  negative  Ay  at  the  tinie  of  termination  of  CS  and  CS„.  The  negative 

1  L 

Ay  occurs  because,  with  an  unreinforced  pair,  no  US  onset  occurs  at  the 

time  of  CS  -CS„  offset  d?id  thus  there  is  nothing  to  cause  the  neuronal 

response  to  be  sustained.  The  drive-reinforcernerit  learning  mechanism 

yielos  negative  aw's  whenever  a  positive  ax  is  followed  within  t time 

i 

steps  by  a  negative  Ay.  Thus,  the  excitatory  weight  associated  with  CS^ 
decreases  initially  in  stage  two  of  conditioning.  Similarly,  the 
inhibitory  weight  associated  with  CS^  is  decreasing  (i.e.,  becoming  more 
negative  or  becoming  stronger  in  terms  of  its  absolute  value)  because  CS^ 
onset  yields  a  positive  Ax^  that  is  followed  by  a  negative  Ay  ot  the 
time  of  CSj-CS^  offset.  The  excitatory  weight  associated  with  CS^^  ceases 
to  decrease  and  starts  increasing  when  the  conditioned  inhibition  becomes 
sufficient,  such  that  the  positive  Ay  following  the  onset  of  CS^  unpaired 
with  CS^  is  larger  than  the  negative  A>  following  the  onset  of  CS^-CS^ 
paired.  The  inhibitory  weight  associated  with  CS„  continues  to  decrease 
(become  more  strongly  inhibitory)  because  its  onset,  yielding  a  positive 
Ax^,  continues  to  be  followed  by  a  negative  Ay  until  the  conditioned 
inhibition  of  CS„  becomes  sufficient  to  cancel  the  conditioned  excitation 


of  CS  ,  at  which  point  the  CS,,  inhibitory  weight,  w,  ( I ) ,  approaches  its 
asymptotic  level.  At  the  same  time,  the  CS^  excitatory  weight,  w,(E), 
approaches  its  asymptotic  level,  equal  to  its  prior  asymptotic  level, 
because  when  the  CS^  conditioned  inhibition  cancels  the  CS^  conditioned 
excitation,  the  reinforcement  of  CS^  unpaired  is  the  only  event  in  each 
trial  that  yields  a  nonzero  Ay  following  a  positive  Ax.  Thus,  toward 
the  end  of  stage  two  conditioning,  the  situation  in  terms  of  positive 
Ax's  followed  by  nonzero  Ay's  is  similar  to  that  which  occurred  in  stage 
one. 


Extinction  and  reacquisiti on  effects 

When  conditioned  excitation  develops  in  conjunction  with  a  CS,  as 
was  the  case  for  CS^  at  the  conclusion  of  stage  one  (trials  1-70)  and 
stage  two  (trials  71-200)  of  conditioning  in  Figure  13,  if  the  CS 
continues  to  be  presented  in  a  third  stage  of  conditioning,  this  time 
without  reinforcement,  then  Pavlov  (1927)  observed  that  the  CR 
extinguishes;  i.e.,  the  CR  decreases  in  magnitude,  reaching  zero  with  a 
sufficient  number  of  unreinforced  presentations  of  the  CS.  In  additiori, 
Pavlov  inferred  that  conditioned  inhibition  developed  during  the 
extinction  process  because  he  observed  "spontatieous  recovery"  of  the  CR 
with  time  and  he  also  observed  more  rapid  reacquisition  of  the  CR  if 
reinforced  presentations  of  the  CS  were  resumed.  The  predictions  of  the 
drive-reinforcement  model  are  consistent  with  Pavlov's  observations  and 


inferences.  Note  that  in  stage  three  (trials  201-300)  of  conditioning  in 
Figure  13,  where  CS^  is  presented  without  rei  tiforcement ,  the  CS^ 
excitatory  weight,  w^(E),  declines  and  the  CS^  inhibitory  weight,  w.^il). 
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grows  stronger,  until  they  cancel  one  another,  at  which  time  the  CR  will 
no  longer  appear. 

Perhaps  a  few  words  are  in  order  regaroing  the  phenomenon  of 
spontaneous  recovery  following  extinction.  Spontaneous  recovery  refers 
to  the  tendency  of  an  extinguished  conditioned  response  to  return  after 
the  CS  is  not  presented  for  some  period  oi  time.  It  seems  that 
spontaneous  recovery  could  be  due  to  the  state  of  the  nervous  system 
changing  sufficiently  with  time  so  that  the  conditioned  inhibition  that 
may  develop  during  the  process  of  extinction  becomes  less  effective.  [As 
noted  above,  Pavlov  (1927)  believed  that  conditioned  inhibition  developed 
during  the  process  of  extinction.  However,  Rescorla  (1969,  p.87)  has 
stated  that  "There  is  only  meager  evidence  bearing  on  this  question".] 
If  the  hypothesized  conditioned  inhibition  were  to  become  less  effective 
because  a  change  in  the  state  of  the  nervous  system  resulted  in  fewer  of 
the  conditioned  inhibitory  synapses  being  active,  then  it  would  become 
easier  for  the  coriditioned  response  to  manifest  again.  If  this 
explatiation  of  spontaneous  recovery  is  correct,  a  neuronal  model  would 
not  be  expected  to  predict  the  phenomenon.  A  network  model  would  be 
required  to  generate  the  prediction. 

In  the  thiro  stage  of  conditioning  in  Figure  13,  the 
drive-reinforcement  model  makes  one  further  prediction  that  has  not  yet 
beei;  discussed.  In  this  simulation,  not  only  was  CS^  presented 
unreinforced  in  stage  three  but  the  CS^-CS^  pair  was  also  presented 
unreinforced.  Pavlov  (1927)  observed  that  under  these  circumstances,  the 
conditiuneo  excitation  associated  with  CS^  extinguished  but  the 
conditioned  inhibition  associated  with  CS^  did  not.  Ihis  is  predicted  by 
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the  drive-reinturcerTient  mudel.  In  the  thira  stage  ut  cunamoning  in 
Figure  13,  notice  that  the  inhibitory  weight,  w^(I),  rediams  unchanged 
during  the  unreinforced  presentations  of  the  Cb.-C3^  pair.  This 
prediction  of  the  drive-reinforcement  model  differs  from  that  of  the 
Rescorla-Wagner  model  of  classical  conaitioriing.  As  Rescorlo  at. a  leaguer 
(1972)  point  out,  their  model  is  iticonsistent  with  the  e^xperi/nental 
evidence  of  conditioned  inhibition  studies  in  that  the  model  prooicts  the 
extinction  of  conditioned  inhibitioti.  The  dri  ve-rei  tiforceinent  model  does 
not  make  this  prediction  because  the  positive  Ax  occurring  at  the  time 
of  CS^  onset  is  not  followed  by  a  positive  Ay. 

Pavlov  (1927)  reported  that  after  extinction  of  a  CR,  if  reinforced 
presentations  of  the  CS  were  resumed,  then  the  Ck  would  be  reacquired 
more  rapidly  than  during  the  first  series  of  reinforced  trials.  The 
drive-reinforcement  model  predicts  this  reacquisition  effect,  as  can  be 
jeen  in  Figure  14  where  delay  conditioiiiny  occurs  in  stage  one  (trials 
1-70),  extinction  of  the  CR  occurs  in  stage  two  (trials  71-140),  ana 
reacquisition  of  the  CR  occurs  in  stage  three  (trials  141-200).  When 
measured  to  an  accuracy  of  three  significant  figures,  the  CS  excitatory 
weight  readied  its  asyniptotic  level  in  61  trials  it,  stage  one  but  only 
required  47  trials  to  reac*^  the  same  level  in  stage  two.  This  elfect 
occurs  because,  during  reacquisition,  the  CS  excitatory  weight  begins  at 
a  higher  level  than  during  the  initial  acquisition  process.  It  may  be 
noted  tfiat  this  prediction  of  the  ori ve-reinforcemenl  h.odel  differs  from 
that  of  the  Rescorla-Wagner  (  1972)  arid  Suttoi.-Borto  (1961)  niodels  ii,  that 
the  latter  two  models  do  tioL  predict  the  mure  rapio  leacguisuiun  of 
conditioned  respor.ses. 
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Figure  14.  Results  of  a  siniulatea  three-stage  classical  conditioning 
experiment  in  which  the  drive-reinforcement  model's  predicted  rate  of 
reacquisition  of  a  CR  in  stage  3  (trials  141-200)  after  extinction  in 
stage  2  (trials  71-140)  is  comparea  with  the  predicted  rate  ot  initial 
acquisition  in  stage  1  (trials  1-70).  Consistent  with  experimerital 
evidence  demonstrating  that  reacquisition  occurs  more  rapidly,  the 
drive-reinforcement  model  predicts  that  acquisition  in  stage  1  will 
require  61  trials  as  compared  with  47  trials  for  reacquisition  in  stage 
3.  (See  text  ano  Appendix  for  details.) 
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Backwdro  condi tium ng 

In  backward  cundltioning,  the  unset  of  the  LS  precedes  the  unset  of 
the  CS.  There  have  been  conflictmy  reports  regarding  wliether  backward 
conditioning  leads  to  conditioned  txcnatiofi  or  conditiunto  inhibition 
(e,g.,  see  review  by  Gormezano,  Kehoe,  and  Marshall,  1983;.  l-iohoney  and 
Ayres  (19/6)  sought  to  design  experiments  that  would  claiify  some  of  the 
issues  and  they  concluded  that  conditioned  excitation  did  result  from 
backward  conditioning.  At  this  time,  the  consensus  appears  to  be  that 
backward  conditioning  can  lead  to  conditioned  excitation  initially  but 
that  extended  backward  conditioning  usually  yields  conditioned  inhibitiori 
(Pavlov,  192:8;  Rescorla,  1969;  Wagner  and  Terry,  1975;  Heth,  1976; 
Schwartz,  1984;  Flaherty,  1985;  Dolan,  Shishimi,  and  Wagner,  1985),  The 
initial  conditioned  excitation  may  be  due  to  transient  effects  associated 
with  global  brain  processes  such  as  arousal  triggered  by  the  onset  of  the 
surprising  US.  In  this  view  of  backward  conditioning,  the  hypothesized 
underlying  process  is  one  of  conditioned  inhibition  which  prevuils  with 
extended  conditioning,  after  the  US  has  come  to  be  expected.  The 
predictions  of  the  drive-reinforcement  model  are  consistent  with  this 
hypothesis,  as  cari  be  seen  in  Figure  15.  In  Figure  15(a),  forward 
conditioning  is  shown  for  a  IS,  the  onset  of  which  occurs  two  time  steps 
before  the  onset  of  the  US.  In  Figure  15(b),  backward  cundi ti oni ng  is 
shown  for  the  same  CS  and  US,  in  this  case  with  the  onset  ut  the  CS 
following  the  onset  of  the  US  by  two  time  steps.  The  drive-reinforcement 
model  predicts  tiiat  backward  conditioning  will  lead  to  conditioned 
inhibition,  consistent  with  the  experimental  results  obtained  in  must 
cases  of  extendea  backward  conditioning,  however,  regarding  these 
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Figure  15.  Results  cf  simulated  classical  conditioning  experiments  in 
which  the  drive-reinforcement  model's  preoictions  for  (a)  forward  and  (b) 
backward  conditioning  are  compared.  Consistent  with  the  experimental 
evidence,  the  model  preaicts  that  conditioned  inhibition  will  result  troni 
backward  conditioning,  in  contrast  to  conditioned  excitation  being 
predicted  as  the  result  ot  forward  conditioning.  (See  text  and  Appendix 
tor  details. ) 
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experimental  results,  J.  U.  Moore  (personal  communication,  June  18,  1986) 


suggests  that  one  caveat  is  in  order:  "...  no  studies  have  used  the 

requisite  combination  of  summation  and  retardation  tests  to  assess  the 
presumed  learned  inhibitory  properties  instilled  by  backward  training." 
blocking  and  overshaaowiny 

Temporal  contiguity  between  a  CS  and  US  is  fundamental  to  classical 
conditioning.  This  has  lung  been  understood  to  be  the  case.  But  while 
temporal  contiguity  is  necessary,  Kaiiiin  (1968,  1969)  has  demoristrated 
that  it  is  not  sufficient.  Kaiiiin  has  shown  that  a  CS  niust  also  have 
predictive  value.  That  is  to  say,  there  must  be  a  contingent 
relationship  between  the  CS  and  US  as  well  as  a  relationship  of  teniporal 
contiguUyi  otherwise,  no  conditioning  will  occur.  Kamin  demonstrated 
this  by  first  reinforcing  CS^  with  a  US  until  an  asymptotic  level  of 
associative  strength  was  reac;hed.  Then  he  added  CS^  such  that  CS^  was 
presented  simultaneously  with  CS^  and  both  were  reinforced.  Kamin  showed 
that  no  or  very  little  associative  strength  developed  between  CS^  and  the 
US.  The  first  CS  was  said  to  have  blocked  conditioning  of  the  second  CS. 

The  dri ve-reinforcement  model  predicts  the  phenomenon  of  blocking , 
as  can  be  seen  in  Figure  16.  In  this  simulated  blocking  experiment,  CS^ 
is  reinforced  by  the  US  in  the  first  stage  of  conditioning  (trials 
1-100),  until  the  CS^^  excitatory  weight  has  approached  its  asyruptotic 
level.  Then,  in  stage  two  of  conditioning  (trials  101-160),  CS^^  and  CS^ 
are  presented  simultaneously  and  reinforced  with  the  US.  It  is  seen  that 
the  CS^  excitatory  weight  remains  unchanged  during  the  second  stage  of 
conditioning.  Consistent  with  the  experimental  evidence,  the 


drive-reinforcenient  iriodel  predicts  that  conditioning  of  will  be 

blocked  by  CS^^,  due  to  the  previous  conditioning  of  CS^. 

US  preexposure  effects  may  be  due  to  the  phenomenon  of  blocking 
(Mis  anc  Koore,  1973).  If  an  aniiiial  experiences  a  number  ot  US 
presentations  prior  to  experiencing  paired  presetitations  of  a  CS  and 
the  US,  the  result  is  that  the  conditiotiing  process  is  retarded. 

This  effect  may  be  due  to  the  experiiiiental  context,  during  US 

preexposure,  becoming  a  blocker  for  subsequent  conditioning  [see 
review  by  Flaherty  (1985)  and,  e.g..  Balsam  and  Schwartz  (1981)]. 

A  question  in  animal  learning  theory  has  been  whether 
contingency  aspects  of  classical  conditioinng  derive  from 
1  iriiitations  on  the  amount  of  associative  strength  available  so 
that,  in  effect,  stimuli  must  compete  for  the  available  associative 

strength  (Rescorla  and  Wagner,  1972.)  or  whether,  in  effect,  stimuli 
must  compete  for  an  animal's  attention  (Sutherland  and  Mackintosh, 
1971;  Mackintosh,  1975;  Moore  and  Stickriey,  1980,  1985).  The 
alterriative  hypotheses  are  not  mutually  exclusive.  Tht 
drive-reinforcenient  neuronal  model's  predictions  are  consistent  with 
the  hypothesis  that  there  are  limitations  on  the  associative 

strength  available  to  stimuli.  however,  the  neuronal  model  does  not 
rule  out  the  involvement  of  higher  level  attention  mechanisms. 

In  the  case  of  the  drive-reinforcement  model,  it  can  be  seen  that 
the  limits  on  Ay(t)  serve  to  limit  the  amount  of  associative  strength 
available  to  conipeting  stimuli.  y{i)  is  buunoed  such  that  it  is  less 
than  or  equal  to  y'(t),  the  maximal  frequency  of  firing  of  the  neurun. 
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Figure  16.  The  drive-reinforcement  model's  predictions  of  the  effects  of 
a  blocking  stimulus.  Consistent  with  the  experimental  evidence,  the 
model  predicts  that  after  delay  conditioning  of  CS^  in  stage  1  (trials 
1-100),  conditioning  of  CS^,  presented  simultaneously  with  CS^  in  stage  2 
(trials  101-160),  will  be  blocked.  The  CS^  excitatory  synaptic  weight, 
w  ,  does  not  change  in  stage  2.  (See  text  and  Appendix  for  oetails.) 


For  nonoverlapping  CSs  and  USs,  Tttc  upper  bound  on  y(t)  may  actually  be 
less  that  y'(t)  because,  in  this  case,  y(t)  never  exceeds  the  amplitude 
of  the  UR.  Thus,  as  was  seen  in  Figure  lb,  if  CS^  has  been  reinforced 
until  an  asymptotic  level  of  conditioning  is  reached,  subsequent 
conditioning  of  a  secono  stimulus,  CS^,  will  be  blocked  if  the  second 
stimulus  forms  a  compound  with  the  first  and  the  onsets  of  CS^  ano  CS^ 
are  simultaneous.  What  happens  is  that  in  stage  1  of  conditioning,  the 
positive  Ax^(t-j)  associated  with  CS^  interacts  with  the  subsequent 
positive  Ay(t)  induced  by  the  onset  of  the  US,  causing  CR^  to  grow  and 
thus  diminishing  Ay(t)  with  each  trial.  Eventually  the  positive  Ay(t) 
associated  with  US  onset  diminishes  to  the  point  where  its  effect  is 
cancelled  by  the  subsequent  effect  of  the  negative  Ay  associated  with  US 
offset.  The  amplitude  of  CR^  has  grown  to  the  point  where  there  is  no 
room  for  the  generation  of  a  net  positive  Ay  subsequent  to  a  positive 
Ax^  when  CS^  is  introduced  as  part  of  a  compound.  Thus,  consistent  with 
the  experimental  evidence  and  consistent  with  the  tiyputhesis  of  Rescorla 
and  Wagner  (1972)  that  there  are  limitations  on.  the  associative  strength 
available  to  stimuli,  the  orive-reinforcement  luodel  predicts  that 
conditioning  will  be  blocked  with  respect  to  CS„. 

A  variant  of  blocking  is  overshaouwing  (e.g.,  Baker,  1968;  Courillor. 
and  Bitternian,  1982),  first  reported  by  Pavlov  (1927),  in  which  two  or 
more  simultaneous  CSs  are  reinforced  in  a  single  stage  of  conditioning. 
In  this  type  of  experiment,  it  is  observed  that  the  more  salient  stimulus 
acquires  the  greatest  associative  strength,  in  effect,  partially  blocking 


conditioning  of  the  other  stiniuli.  The  or  i ve-reinforceinerit  model  predicts 
overshadowing,  as  may  be  seen  iii  Figure  17.  In  this  simulated  classical 
conditioning  experiment,  three  simultarieous  CSs  are  reinforced  by  a  US. 
CSj^  and  CS^  are  of  equal  amplitude.  The  amplitude  of  CS^  is  twice  that 
of  either  of  the  other  two  CSs.  Consistent  with  the  experimental 
evidence,  the  drive-reinforcement  model  is  seen  to  preoict  that  the  CS^ 
excitatory  weight  will  achieve  a  substantially  higher  asyniptotic  value 
than  the  equal  and  lower  asymptotic  values  achieved  by  the  CS^  and  CS^ 
excitatory  weights.  This  effect  occurs  with  the  dnve-reinforcement 
model  because  the  change  in  the  presytiaptic  frequency  of  firing  upon 
CS  onset  is  greater  for  CS„  than  it  is  for  CS  or  CS  ,  Thus,  the  CS.^ 
excitatory  weight  increases  niore  rapidly,  taking  up  a  larger  fraction  oi 
the  total  available  associative  strength  than  either  the  CS.  or  CS^ 
excitatory  weights. 

Compound  conditioning 

In  compound  conditioning,  multiple  CSs  are  presented  simultaneously 
or  sequentially  for  reinforcement  (or  for  nonreinforcement).  Compound 
CSs  have  appeared  in  some  of  the  simulated  classical  conditioning 
experiments  discussed  above,  including  those  experiments  involving 
conditioned  inhibition,  blocking,  and  overshadowing. 

A  compound  conditioning  expeiiment  reported  by  Rescorlu  and  Wagner 
(1972)  can  be  utilized  as  a  test  of  the  drive-reinforcement  model.  The 
experifiiental  results  were  obtained,  Rescorla  and  Wagrier  (  1972)  note,  in  a 
previously  unpublished  study  due  to  Wagner  and  Saavedra.  The  experiment 
involved  comparing  the  effects  of  two  CS-US  configurations.  In  one 
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Figure  17.  The  cirive-reinforcenient  model's  predictions  of  the  effects  of 
stimulus  salience  on  compound  conditioning.  Consistent  with  the 
experimental  evidence,  the  model  predicts  thdt  a  more  salient  stimulus, 
CS^,  which  has  an  amplituoe  of  0.4  will  condition  iiiore  rapidly  and 
strongly  than  less  salient  stimuli,  CS^  and  CS^ ,  each  with  an  amplitude 
of  0.2.  The  asymptotic  excitatory  synaptic  weight  for  CS^  is  more  than 
double  that  of  either  the  CS^  or  CS^  asymptotic  excitatory  syr:dptic 
weights.  Thus,  the  drive-reinforcement  inodel  predicts  the  phenomenon  of 
oversliadowing.  (See  text  and  Appendix  tor  details.) 


conf icjurdtion,  CS^  occurring  alone  was  reintorced  and  also  CS^  pairea 
wnh  CS^  was  reinforced.  An  exaniple  of  such  a  CS-US  configuration 
appears  in  Figure  ib(a).  In  the  second  configuration,  an  example  of 
which  is  shown  in  Figure  18(b),  CS^  occurring  alone  was  not  reinforced; 
only  paired  with  CS^  waS  reinforced.  In  the  case  of  the  first 

configuration,  where  both  CS^  alone  and  CS^-CS^  paired  were  reinforced, 
the  asymptotic  associative  strength  of  CS^  was  ol)setveo  to  be  high  and 
that  of  CS^  was  observed  to  be  low.  The  ranking  of  the  asymptotic 
associative  strengths  reversed  when  the  second  configuration  was 

employed,  in  which  CS^  alone  was  not  reinforced  and  CS^-CS^  paired  was 
reinforced.  These  results  are  predicted  by  the  dri  ve-rei  nforcemerit 

model,  as  can  be  seen  in  Figure  18.  In  effect,  what  happens  is  tlial  ttie 

CS  that  more  reliably  predicts  the  US  comes  to  block  the  other  CS. 

Space  limitations  preclude  the  presentation  of  additional  results  of 
computer  simulations  of  coicpound  conditioning  experiments.  However,  two 
other  compound  conditioning  effects  that  are  predicted  by  the 
drive-reinforcement  model  should  be  noted.  In  the  case  of  the 

overexpectatiori  paradigm,  two  stimuli,  CS^  and  LS^,  are  first 
individually  conditioned  to  an  asymptotic  level,  each  stimulus  being 
reinforced  with  the  same  US.  Then,  in  a  second  stage  of  LOtiditiuninq , 
the  two  stimuli  are  presented  as  a  compound  that  is  reititorceo  utilizing 
the  same  US  as  in  the  first  stage.  Kescorla  and  Wagner  (19/k)  and  Kremer 
(1987)  report  that  the  associative  strenytfis  of  the  Two  stimuli  Oei  rease 
in  the  second  stage  of  conditior.ing.  FurtheniKjre ,  it  or,  initiallg 
neutral  stimulus,  CS,  ,  is  presented  in  coii.(.(a>ttd  with  lS,  and  dS  in  the 
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Figure  18.  Results  of  simulated  tonipound  condltiunltig  experiments  In 

which  the  dri  vt-rei nforcement  model’s  predictions  for  reinforced  and 

nonrei n1 orced  CS's  are  compdred.  Consistent  with  the  experimental 

evidence,  in  (a)  the  model  predicts  strong  conditioning  ot  tS^  relative 

to  CS, ,  where  both  CS,  alone  and  the  CS  -CS  pair  are  reinforced.  Again 
1  1  12 
consistent  with  the  experimental  evidence,  in  (b)  the  model  predicts  that 

the  ranking  of  associative  strengths  for  CS^  and  CS^  will  be  reversed 

with  respect  to  (a)  when  CS^  alone  is  not  reinforced  and  the  CS^-CS^  pair 

is  reinforced.  (See  text  and  Appendix  for  details.) 


second  stage  of  conditiorntuj,  CS  becomes  a  conaitioneo  mhibili.r.  Irit 
drive-reinforcement  model  predicts  these  effects. 

In  the  case  of  superconditioning,  the  cumpounu  ii,  1-  ttinf';  : 
consists  of  two  stimuli,  one  initially  neutral  ar.d  the  ottier  j 
conditioneo  inhibitor  by  virtue  of  prior  conditioning.  RLintorceiicnt  uf 
this  compound  is  observed  to  yield  an  asymptotic  associative  strei.gtli  fcn- 
the  initially  neutral  stimulus  that  is  greater  tliai;  the  corresponciing 
associative  strength  in  a  control  experiment  in  which  both  stiiiiuii  aie 
initially  neutral  (Rescorla,  1971;  Wagner,  19/1;.  liit; 
drive-reinforcement  model  predicts  this  effect. 

Uiscriminati ve  stimulus  effects 

The  simulated  classical  conditioriing  experiments  discussea  abuvc, 
the  results  of  which  were  shown  in  Figure  18,  involved  coii,.inio 
cotiditioiiing  and  discririiination  learning.  Discriiiiination  li-orin,, 
experin.ents  test  an  driimal's  ability  to  discriniinate  betvveun  reinf(ji\:od 
and  nonrei nforced  CSs.  A  more  cunsplex  exaiuple  of  u  (.ompound  cond  i  L i on i  in, 
experiment  that  tests  for  discriit.iiiative  stimulus  etteci'.  iv  shown  in 
Figure  19(d),  where  the  compound  is  reinforced  and  the  eumpounu 

CS^-CS..  IS  not  reinforced.  For  ttiis  CS-bb  cont  igura t ion  ,  expt  r  imen tci  i 
evidence  reviewed  by  Rescorla  aric  Wagner  (i97Z)  suyuesfs  that  the 
asyDiptotic  associative  strengths  will  be  high  tor  Lb  ,  ’uw  for  Lb  ..nd 

1  J 

zero  for  CS^.  Actually,  Cb^  is  observed  in  the  expet  i  ii  ’  .  m.  ..i;, 
conditioned  inhibitor.  It  is  seen  in  Figntt  m,*  f  fi. 

dri  ve-rei  ntorcement  model  predicts  these  results.  !  .,r' Sri  sh  1 1  ,  if 
drive-reinforcement  model  predicts  that  the  cumbi  neo  a...  i, 
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Figure  19.  Reiultb  ut  biniulated  ( (Mnpound  curi(3ltion1ny  expLt  inients  in 
which  the  arive-reinforceineric  iiiuclfl's  preijictiuns  of  tiie  effects  of 
di  scriniiiiati  ve  stimuli  were  deternined  for  a  more  complex  Cose  than  th^t 
portrayed  in  Figure  18,  (See  text  and  Appendix  Tor  oetails.) 


btreriyths  of  CS^  ana  CS.  will  ificrease  inilially  ano  then  decrease.  This 
transient  effect  predicted  by  the  model  hus  been  observed  by 

experimentalists,  as  Rescorla  and  Wagner  {19Tl)  note. 

A  CS-US  conf iguratioTi  sinnlor  to  that  shown  iii  Mgure  19(d)  is  stuwi 
in  Figure  19(bj.  Rescorla  and  Wagtier  (19/1')  review  the  results  ut  a 

study  by  Wagner,  Logan,  Haberlandt,  and  Price  (1968)  in  whieh  the 
discrinuriciti  ve  stimulus  effects  of  the  CS-US  configuration  shown  in 
Figure  19(d)  were  compared  with  the  eftects  of  the  CS-US  cent  lyuratiuri 
shown  iri  Figure  19(b).  The  CS-US  configuration  shown  iii  Figure  19(b) 

represents  c.  "pseudodiscrimination"  expenruent  in  that  both  compound  CSs 

are  reinforced  sometimes  and  both  are  nonreinforceo  soinetiiies  so  it  is 
actually  a  partial  reinforcement  experiment.  Because  of  the  similarity 
between  the  CS-US  conf igurations  in  panels  (a)  ana  (b^  of  Figure  19,  it 
IS  of  iriterest  to  conipare  the  experimental  outcomes.  It  waS  touna  by 
Wagner  et  al.  (1968)  that  while  CS^  was  reinforced  an  equal  fraction  of 
the  time  in  both  the  discrimination  at;o  the  pseuoodi scntm riction  training 
and  occurred  in  compound  with  the  Same  CSs,  the  everitual  asset  lative 
strength  of  CS.,,  when  tested  alone,  was  much  greater  after 

pseudodi senmi riati on  training  than  after  di set  inn ria ti on  trairiing.  This 
IS  predicteu  by  the  or ive-reinforcement  mocel  .  as  can  be  seen  by 

comparing  the  asymptotic  sgnaptic  weights  for  CS  ^  iri  panels  (a)  and  (trj 
of  Figure  19.  The  net  CS^  asymptotic  synaptic  weight  (i.e.,  the  l  i  ^ 
dSymptoLU  excitatory  weight  iinnus  the  abstilute  value  of  the  IS 
asymptotic  itihibitory  weight)  iti  Figure  19(b)  is  appruxiiuately  dcublt 
that  of  the-  net  CS  asymptotic  synaptic  weight  in  Figure  lUia,.  It 


houicJ  be  noteil  ibol  t>ie  Kescorl a-Wagner  (i972)  and  Sutton-Bartu  U9B1) 


i.LiUls  also  ctriectiy  predict  the  experimental  outcomes  of  the 
G 1  sen  muid  1 1  on  and  pseudodiscrimination  experiments  just  discussed, 
it'cluding  the  tr'ansietit  increase  in  the  associative  strength  of  the 

cb  -CS  compound  stimulus  in  the  case  of  the  discrimination  training. 

2  3 


.3  variur.t  of  the  drive-teinturcement  neuronal  iiiodel 

The  dn ve-reiriforceii'ent  neuronal  model,  as  specified  above,  requires 
itiat  a  positive  change  in  presynaptic  signal  level  occur  in  order  that  a 
synapse  be  rendered  eligible  for  a  change  ir.  its  efficacy.  It  was  noted 
eorher  tfiat  the  best  argument  tor  this  constraint  is  that  it  yields  a 
neuronal  model  that  is  consistent  with,  the  experiiricntal  eviotrice  of 
classicol  conditioning.  When  ttie  constraint  is  liflto  so  that  Ax^.(t-j) 
in  equation  (2;  i'  allowed  to  assume  any  value,  positive  or  negative,  the 
neiittiiol  (iiodel  then  frequently  generates  predictions  that  deviate 
sut,',ldritio  ’  troiii  the  experimenlol  evidence.  An  example  is  shown  in 
Fujuri'  .3.  Ttie  simulated  classiral  conditioning  experiment  reported  in 
Finort  i'l  is  locttual  to  one  repurteo  in  Figure  16  (which  was  o  blocking 
exoept  that,  in  the  case  ot  Figure  20,  Ax  (t-j^  old  not  have 


mi  ht  i.rfJtet  than  zero  for  tite  learrmaj  nechdoisni  tc  be  iriggLceC.  With 

th’',  kit.,tidint  reri(ived,  the  fs  at.u  excitatory  synaptic  weights 

1  1 

■  Ip,  II  lit  iMirmty  bt  cause  the  negative  Ax  occurring  at  the  tiiiie  ut  Lb 
,t‘  I  ■  m  II  Li  1  t  I  (  .  led  by  thi  i.euative  Ay  occurring  at  the  time  ot  LS 
mtni.  "le  len.ivur  chat  is  plotted  in  Figure  iU  is  a  clear-cut 


rii,-.  'I'.iii  u;  the  tileekiru;  phenomenor  wliile  that  plotted 
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Figure  20.  Results  of  d  simulated  blocking  experiment  that  was  Identical 
to  that  reported  1n  Figure  16  except  that  the  dri ve-re i ritorcement  model 
utilized  to  generate  the  predictions  u;  Figure  16  rendered  a  s>r.dpse 
eligible  tor  a  change  In  its  efficacy,  w.,  only  upon  the  occurrence  of  a 

positive  change  in  the  presynaptic  signal  level.  To  generate  the 
predictions  shown  here,  a  variant  of  the  dri ve-reintorcemerit  model  was 
employed,  such  that  both  positive  and  negative  changes  in  presynaptic 
signal  levels  rerioered  a  synapse  eligible  for  a  change  In  Its  efficacy. 
It  is  seen  that  the  variant  of  Llie  n.(idel  employed  here  yields  predictions 
that  deviate  markedly  from  the  experiiierital  evidence.  Because  tiiese 
deviations  are  typical  of  this  variant  of  the  dri ve-rei i.forcement  model, 
the  other  varioiit  of  the  model,  utilized  to  generate  the  predictions 
showri  in  Figures  4  through  ,  seenis  more  likely  to  reflect  the  function 
of  biological  rieurons.  (See  text  and  Appendix  tor  details.) 


20  bears  no  ciiscernable  relationship  to  expennier.tal  ly  observed  behavior 
in  the  case  of  blocking  experiments. 

Summary 


By  means  of  computer  simulatioris  of  the  dri ve-reinforcenient  neuronal 
model,  It  has  been  shown  that  the  model  correctly  predicts  classical 
conditiofiing  phenomena  in  the  following  basic  categories:  delay  and 
trace  conditioning,  conditioned  and  uncoridi  tioned  stiniulus  auration  and 
amplitude  effects,  partial  rei nf urcenient  effects,  i riterstimul us  interval 
effects  including  simultaneous  conditioning,  secono-order  conditioning, 
conditioned  inhibition,  extinctiori,  reacgui si t ion  effects,  backward 
coridi tioning ,  blocking,  overshadowi ng ,  compound  conoitioning ,  and 
di scriiiii I  all ve  stin.ulus  effects. 


SECTION  4 

DRIVES  AND  REINFORCERS 

The  behavior  of  the  proposed  neuronal  model  may  be  understood  in 
terms  of  two  processes  involving  postulateo  neuronal  drives  and 
reinforcers.  It  weighted  presynaptic  signal  levels  are  defined  to 
be  neuronal  drives  and  weighted  changes  in  presynaptic  signal 
levels  are  defined  to  be  neuronal  reinforcers,  then  the 
drive-reinforcement  learning  mechanism  operates  such  that  neuronal 
drive  induction  promotes  learned  excitatory  processes  and  neuronal 
drive  reduction  promotes  learned  Inhibitory  processes.  The 
interplay  between  these  two  processes  yielos  the  classical 
conditioning  phenomena  discussed  above. 

In  this  section,  definitions  of  drives  and  reinforcers  at  the  level 
of  the  single  neuron  and  at  the  level  of  the  whole  animal  will  be 
examined  further.  Then  the  relationship  of  the  drive-reinforcement 
neuronal  model  to  cinimal  learning  theory  will  be  discussed.  I  will  begin 
by  offering  precise  definitions  of  drives  and  reinforcers ,  definitions 
motivatea  by  the  neuronal  model  as  it  may  be  viewed  in  the  context  of 
animal  learning  theory. 

Definitions 

For  the  drive-reinforcement  neuronal  model ,  neurondl  drives  are 

defined  to  be  the  weighted  presynaptic  signals,  w  (t)  x  (t).  These 

i  1 

weightea  presynaptic  signals  drive  the  neuron.  Equation  (1)  is  tenneo 
the  drive  equation  because  it  specifies  tiow  neuronal  drives,  w  (t)  x^(t), 
are  transformed  into  neuronal  behavior,  y(t).  ^eumrial  rei nfurcers  are 
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defined  to  be  the  weicjhted  changes  in  presynaptic  signal  levels,  wjt) 

Ax.(t).  Neuronal  reinforcement  results  from  the  net  effect  of  all  of 
1 

the  weighted  Ax  Vs  experienced  b>'  a  neuron  at  time,  t.  Neuronal 
reinforcers  thus  manifest  as  Ay(t)  and  neuronal  reinforcement  is  defined 
to  be  equal  to  Ay(t).  Note  the  distinction  here;  A  neuronal 
reintorc^r  is  a  weighted  change  in  signal  level  that  the  neuron 
experiences  at  a  single  synapse;  neuronal  reinforcement  is  defined  to  be 
the  collective  effect  of  the  neuronal  reinforcers,  manifesting  as  the 
change  in  output,  Ay(t).  Incremental  neuronal  reinforcement  is  defined 
to  be  an  increase  in  the  postsyrsaptic  frequency  of  firing  and  decremental 
neuronal  reinforcement  is  defined  to  be  a  decrease  in  the  postsynaptic 
frequency  of  tiring,  with  both  increases  and  decreases  in  firing 
frequency  measured  over  intervals  not  exceeding  a  few  seconds. 

It(  the  drive-reinforcement  neuronal  model,  changes  in  presynaptic 

signal  levels  play  two  roles.  A  change  in  presynaptic  signal  level, 

Ax^(t),  renders  the  1  synapse  eligible  tor  future  reinforcement.  The 

synaptic  weight,  for  such  an  eligible  synapse  changes  if  a 

subsequent  cnange  in  postsynaptic  signal  level.  Ay,  occurs  not  more  than 

Ttime  steps  in  the  future.  The  other  role  for  Ax.(t),  when  weighted  by 

w^(t),  is  to  cot, tribute  to  (i.e.,  partially  or  wholly  cause)  Ay(t)  and 

thus  reinforce  synapses  rendered  eligible  by  earlier  changes  in 

ijreb_ynopti c  signal  levels.  In  effect,  Ax  (t)  looks  to  the  future  with 

1 

•’■p.jrcj  It  Its  role  in  rendering  a  synapse  eligible  for  r  ei  nforrpmpnt  and 
•  m  the  past  in  contributing  to  the  reinforcement  of  synapses 

eot'lif'r.  Equation  {Z) ,  the  neuronal  learning 


mechanism,  is  termed  the  reitiforcenient  equation  because  it  specifies  how 
neuronal  reinforcers  [w.(t)  Ax,(t)'s  manifesting  collectively  as  Ay(t)J 
are  transformed  into  changes  in  behavior  [due  to  AwAt)'sj,  Thus,  we 
see  that  equation  (1),  the  drive  equation,  involves  the  processing  of 
signal  levels  to  yield  behavior  and  equation  (2),  the  reinforcement 
equation,  involves  the  processing  of  changes  iti  signal  levels  to  yielo 
learning. 

It  was  noted  earlier  that  the  drive-* einforcement  learning  mechanism 
moves  the  orisets  and  offsets  of  pulse  trains  to  earlier  points  in  time. 
It  should  also  be  noted  that,  in  doing  this,  the  learning  mechanism  sets 
up  the  possibility  of  a  chain  of  reinforcing  events.  Because  of  the  way 
ax's  and  Ay's  interact  in  the  model  to  yield  Aw's,  Ay’s  come  to  occur 
earlier  in  time,  making  them  available  to  reitiforce  even  earlier  Ax’s. 
Thus,  chains  of  reinforcing  Ax's  and  Ay's  can  be  established  beginning 
with  a  primary  reinforcer  (which  will  be  defined  below). 

While  the  dri  ve-reinforcement  neurondl  niodel  appears  complex 
relative  to,  say,  the  hebbian  model,  this  seems  appropriate  because  the 
single  neuron  is  coming  to  be  recognized  as  a  highly  sophisticated  cell. 
None  of  the  operations  proposed  here  seem  inconipatible  with  the  known 
capabilities  of  the  sitigle  neuron  (e.g.,  see  Wooay,  1982,  1986). 

The  terms  1  have  defined  at  a  neuronal  level  mirror  terms  animal 
learning  researchers  have  defined  at  the  level  of  the  whole  animal. 
Additional  terms  may  be  atfined  in  this  way.  For  example,  innate  or 
primary  neuronal  drives  may  be  distinguished  from  acquired  neuronal 
drives.  Primary  neuronal  drives  are  oefined  to  have  fixed  synaptic 
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weights.  Acquired  neuronal  drives  are  defined  to  have  variable  synaptic 
weights,  under  the  control  of  the  neuronal  learning  mechanism.  Primary 
neuronal  drives  will  include  deficit  related  signals  having  an  internal 
source  (drives  to  eat  and  drink  are  examples)  and  unconditionea  stimuli 
having  an  external  source  (food  and  water  are  examples).  Acquired 
neuronal  drives,  likewise,  are  expected  to  have  internal  sources  (as  the 
result  of  possible  conditioning,  for  example,  of  the  hypothalamic  reward 
and  punishment  centers)  and  to  have  external  sources  in  the  case  o1  what 
becomes  conditioned  stimuli.  The  notion  of  acquired  drives  was  first 
suggested  by  Miller  (1951). 

Psychologists  have  generally  defined  drives  to  include  only  the 
category  of  deficit  related  internal  signals.  I  am  broadening  the 
definition  to  include  any  signal  that  drives  a  neuron.  My  definition  of 
primary  drives  comes  closer  to  the  conventional  definition  of  drives  but, 
in  this  case,  I  still  include  (external)  unconditioned  stimuli  as  well  as 
(internal)  deficit-related  signals.  My  point  in  changing  the  definition 
is  to  suggest  that  drives,  defined  in  this  broader  fashion  and  at  a 
neuronal  level  ,  can  serve  as  the  basis  for  a  simpler  and  more  rigorous 
learning  theory. 

I  have  noted  that  neuronal  drives  can  have  internal  and  external 
sources  and  can  be  primary  (innate)  or  acquired.  The  same  is  true  of 
neuronal  reinforcers  as  they  have  been  defined  above.  Unconditior;eo 
stimuli,  for  example,  function  as  primary  drives,  yielding  unconditioned 
responses.  Unconditioned  stimuli  also  tunction  as  primary  reinforcers, 
yielding  conditioned  responses.  The  drive-reinforcement  niodel  suggests 
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that  when  an  unconditionecl  stimulus  is  functioning  as  a  neuronal  drive , 
It  is  the  signal  level,  itself,  that  is  important  [see  equation  (1)]  anc 
when  an  unconaitioned  stimulus  is  functioning  as  a  neuronal  reinforcer, 
it  is  the  onset  and  offset  of  the  signal  that  is  important  [see  equation 
(2)]. 

I  have  defined  drives  and  reinforcers  in  a  straightforward  fashion 
at  a  neuronal  level.  However,  such  clear-cut  definitions  have  not  proved 
to  be  possible  at  the  level  of  the  whole  animal.  For  example,  Toates 
(1985,  p.  963)  remarks  that  the  notion  of  drive  "has  been  around  for  a 
long  time.  No  one  seems  to  know  quite  why  we  need  the  concept,  but  we 
keep  putting  it  on  display.  It  tends  therefore  to  assume  a  variety  of 
uncertain  functions."  i  am  going  to  argue  that  we  should  not  be  sur¬ 
prised  by  this  state  of  affairs.  In  the  history  of  animal  learning 
research,  it  has  not  been  unusual  for  the  notions  of  drives  and  rein¬ 
forcers  to  be  seen  as  problematic.  When  such  notions  are  invoked  at  the 
level  of  the  whole  animal,  this  may  be  understandable.  If  the  notions  of 
drive  ana  reinforcement  are  relatively  straightforward  at  the  level  of 
the  single  neuron,  as  I  am  suggesting  here,  then  we  should  not 
necessarily  expect  such  notions  to  also  be  straightforward  at  higher 
levels.  If  neurons  are  classically  conditionable  cells  in  their  own 
right,  as  the  drive-reinforcement  model  suggests,  then  when  millions  or 
billions  of  such  neurons  interact  in  phylogenetical ly  advanced  nervous 
systems,  the  interactions  may  not  be  simple.  That  we  can  make  as  much 
sense  out  of  whole  brain  furtction  as  we  have,  thanks  to  the  dedication  of 
animal  learning  researchers  and  many  others,  might  even  be  seen  as 
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surprising,  considering  the  complexity  of  the  neural  network  of,  say,  a 
dog.  That  Pavlov  (1927)  anc  those  who  worked  with  at, a  after  him  were 
able  to  see  their  way  through  to  a  relatively  clear  view  of  classical 
conditioning  suggests  that  brain  function  may  not  be  as  complex  as  we 
might  have  expected.  However,  as  Gray  {197b)  demonstrates  in  an 
especially  careful  and  incisive  analysis,  complications  arise  with  the 
notions  of  drives  and  reiriforcers  at  the  level  of  the  whole  animal. 

If  the  notion  of  drive  has  been  problematic  at  the  level  of  the 
whole  animal,  what  about  the  notions  of  drive  reduction  and  drive 
induction,  postulated  to  function  as  reinforcers  (e.g.,  see  Mowrer, 
1960)?  I  have  suggested  that,  at  a  neuronal  level,  drive  reduction  and 
induction  have  straightforward  roles  to  play  with  respect  to  the  process 
of  reinforcement.  Assuming  for  the  moment  that  the  hypothesized 
arive-reinforcement  neuronal  model  is  correct,  how  might  we  expect 
neuronal  drive  reduction  and  induction  to  map  onto  the  level  of  the  whole 
animal?  Let  us  consider  an  example.  The  global  reward  or  "pleasure" 
centers  discovereo  by  Olds  and  Milner  (1954)  are  known  to  be  inhibitory 
(Fuxe,  1965)  so  they  would  be  expected  to  yield  decremental  neuronal 
reinforcement.  However,  we  know  that  the  salivary  reflex  is  excited  by 
the  taste  of  food.  Also,  the  brain's  global  reward  cettters  are 
presumably  excited  by  the  taste  of  food  but  they  will,  in  turn,  deliver 
inhibition  throughout  the  nervous  system.  This  inhibition,  in  some 
cases,  is  likely  to  reach  inhibitory  interneuruns  and,  thus,  in  effect, 
could  be  translated  into  exci  tation.  Disitihibi  tion  is  known  to  play  a 
major  role  in  the  nervous  system  (Roberts,  1980),  We  can  see  then  that 


there  will  bt  no  clear-cut,  simple  iiiapping  uf  excitation  and  inliibition 
into  drives.  Neither  should  we  expect  increases  and  decreases  in 
excitation  and  inhibition  (neuronal  drive  reduction  ano  induction)  to  map 
in  a  clear-cut,  simple  way  into  global  reintorcement  (i.e.,  reward  and 
punishment).  In  each  case,  the  involved  neural  network  will  have  to  be 
considered  before  any  mappiny  ot  neuronal  drives  ano  reintorcers  into 
global  drives  and  reinforcers  can  be  established. 

Evidence  for  this  kind  of  complexity  has  been  ot)tained  by  Keene 
(1973).  Olds  (1977,  p.  95)  has  summarized  Keene's  findings  as  follows: 
"A  family  of  neurons  excited  by  aversive  brain  shocks  arid  inhibited  by 
rewarding  ones  was  identified  in  the  intralaminar  system  of  the  thalamus; 
and  a  second  family  accelerated  by  rewards  and  decelerated  by  punishments 
was  observed  with  probes  in  the  preuptic  area."  Keene's  results 
demonstrate  that  the  brain's  global  processes  of  reward  ai.a  punishment 
can  have  opposite  effects  in  different  parts  of  the  nervous  system. 
Thus,  the  practical  coRiplexity  of  this  situation  at  the  le'-cl  of  the 
whole  animal,  reflecting  perhaps  the  pragmatic  decisions  of  the 
evolutionary  process,  may  account  for  the  problematic  history  of  tlie 
psychological  notions  of  drives  and  reinforcers. 

Relationship  of  the  drive-reinforcement  lieurona!  mode  1  to  aninial 
learning  theory 

having  defined  drives,  reinforcers,  and  related  teriiis  at  a  neuronal 
level,  and  having  acknowledged  the  complexities  ti.at  arise  around  these 
concepts  at  the  level  of  the  whole  animal,  I  will  now  discuss  how  the 
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drive-reinfcrcenient  neuronal  model  relates  to  theories  of 


animal 


learniny . 

In  this  century,  the  stuuy  of  learning  began  wilfi  stimulus-response 
(S-R)  association  psychology  (Thorndike,  1911;  Pavlov,  1927;  Guthrie, 
1935).  In  place  of  S-R  association  psychology,  the  drive-reinforcement 
neuronal  model  suggests  what  could  be  called  AS-  ar  association 
psychology.  The  neuronal  model  suggests  that  it  is  tict  stimuli  and 
responses  that  are  associated  but,  rather,  changes  in  stimuli  and  changes 
in  responses  except,  of  course,  in  the  theoretical  model  i  am  proposing, 
is  figtironal  aS's  and  aR's  that  are  associated,  not  the  AS's  and 
AR's  of  the  whole  animal.  At  the  level  of  the  whole  animal,  we  can 
expect  a  more  complicated  situation,  as  I  have  already  discussed. 

hull  (1943)  confronted  the  complexities  that  arise  at  the  level  of 
the  whole  animal.  As  Hilgard  and  Bower  (1975)  note,  Hull,  iri  his 
herculean  effort  to  systematize  learning  theory,  was  moving  psychology 
from  an  S-R  fortiiulation  to  an  S-O-R  fortiiulation,  where  "0"  represented 
the  state  of  the  organism.  Central  to  Hull's  (1943)  theory  of  learning 
was  the  definition  of  reinforcement  as  drive  reduction.  Hull  (1952)  went 
on  to  revise  his  position,  concluding  that  reinforcemerit  should  be 
defined  as  drive-stimulus  reduction.  Actually,  Hilgard  ano  Bower  (1975, 
p.  167)  observe  that  "While  favoring  drive-stimulus  reduction,  Hull  left 
the  matter  somewhat  open,  having  vacillated  between  orive  reduction  and 
drive-stimulus  reduction  as  essential  to  reinforcement"  (Hull,  1952,  p. 
153).  The  dri vfc-r’ei nforcement  neuronal  model  suggests  that  F.ull  may  have 
been  riyfit  on  both  counts;  both  drive  reduction  and  drive-stimulus 
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rediicnon  may  functiun  ds  rLinturLers  because  butt:  can  result  in  Ay's. 
Thus,  at  a  neuronal  level,  the  di sti nctiun  bi'tween  drive  reduction  and 
drive-stimulus  reduction  appears  to  dissolve.  We  see  a  reason  why  drives 
should  probobly  be  defined  more  broadly  than  Hull  coiisiderec. 

Hull's  narrower  definition  of  drives  resulted  in  another  problem  tor 
his  theory.  Hull's  identification  of  drives  and  drive  reduction  with 
physiological  needs  or  tissue  deficits  did  not  seem  to  leave  room  for 
such  phenoriena  as  animal  play  and  the  learning  that  results.  Mishkin  and 
Petri  (1984,  p.  ki92)  point  out  that  "Shortly  after  Hull  developed  Lhisj 
ideas,  a  number  of  studies  on  curiosity,  manipulation  and  exploration 
suggested  that  other  motives,  not  obviously  related  to  physiological 
needs,  also  generated  learning."  Mishkin  and  Petri  go  on  to  say  that 
"The  recognition  that  there  are  motives  that  have  no  apparent  basis  in 
tissue  deficits  or  other  physiological  needs  was  one  major  factor  that 
eventually  led  to  the  demise  of  the  drive  reauction  theory  of  learning 
(BoHes,  1967)."  The  drive-reinforcement  neuronal  niodel  solves  the 
problems  encountered  with  Hull's  theories  by  iiioving  from  the  level  of  the 
whole  animal  to  the  level  of  the  single  neuron,  by  suggesting  a  broader 
definition  of  drives,  by  allowing  both  drive  reduction  and  drive 
induction  to  be  reinforcing  [consistent  with  Mowrer  (1960)],  and  by  not 
necessarily  identifying  orive  reduction  with  reward. 

Regarding  the  relationship  of  drive  reduction  to  reward,  Gray  (197b) 
discusses  the  question  of  whether  rewards  and  punishments  should  be 
associated  with  drive  decrements  and  increments,  respectively.  Based  on 
Gray's  analyses  and  those  of  others  when,  he  cites,  I  have  come  to  tlie 


conclusion  that  too  close  an  identif icaticn  of  drive  reduction  with 

reward  may  not  be  warranted.  The  Darwinian  process  iiwy  have  been  more 
flexible  in  its  approach  as  it  evolved  nervous  systems.  Therefore,  I 
will  not,  in  the  theoretical  framework  1  aiii  proposing  in  this  report, 
identify  drive  decrements  with  reward  and  drive  increments  with 

punishment  even  though,  as  general izations ,  such  identifications 
may  be  valid.  There  is  nothing  in  the  theoretical  framework  that 

requires  such  a  rigid  identification  to  make  the  theory  workable. 

After  Hull,  animal  learning  theory's  next  major  stef)  forward  was 

due,  in  niy  opinion,  to  Mowrer  (1960),  A  colleague  of  Hull's  at  Yale, 
Mowrer  moved  from  Hull's  drive  reduction  (or  drive-stimulus  reduction) 
theory  to  a  symmetric  theory  in  which  learning  was  attributed  to  both 
drive  reduction  ano  drive  induction.  Also,  in  Mowrer's  theory,  classical 
conditioning  was  accepted  as  basic.  Mowrer's  emphasis  on  classical 

conditioning  and  on  symmetric  processes  in  learning  has  had  a  strong 

influence  on  the  theoretical  framework  I  am  proposing  in  this  report. 

Since  Mowrer  proposed  his  theory,  substantial  theoretical  and 
experimental  advances  have  occurred  in  the  understanding  of  classical 
conditioning  phenomena.  Model  systems  such  as  the  rabbit  nictitating 

membrane  response  are  providing  a  refined  understanding  of  classical 
conditioning  at  psychological  and  neurobiological  levels  (e.g.,  see 

Gormezano,  1972;  Moore  and  Gormezano,  1977;  Moore,  1979;  Gormezano, 
Kehoe,  and  Marshall,  1983;  Thompson,  1976;  Thompson,  Berger,  and  Madden, 
1983).  Also,  the  investigations  of  Kamin  (1968)  and  Rescorla  and  Wagner 


(1972)  have  clearly  deiiionst  rated  continyency  aspects  of  classical 
conditioning  as  distinguished  from  contiguity  aspects. 

Along  with  ati  increased  understanding  of  classical  conditioning  has 
come  a  growing  conviction  on  the  part  of  some  animal  learning  theorists 
that  classical  conditioning  phenomena  are  fundamental  to  animal  learning; 
instrumental  conditioning  phetiomena  are  then  de-emphasized  by  these 
theorists.  Mowrer  (1960)  early  on  and  Bindra  (1976,  197b)  more  recently 
have  been  leaders  in  this  movement.  The  drive-reinforcement  neuronal 
model  is  consistent  with  this  view.  If  brains  are,  fundamentally, 
classically  condi tionable  systems,  then  this  may  be  because  they  are 
composed  of  classically  conditionable  neurons,  as  the  dnve-reinforcement 
model  suggests.  Instrumental  conditioning  phenomena  are  then  seen  to 
arise  out  of  a  neural  substrate  that  utilizes  classical  conditioning 
mechanisms.  As  Bindra  (1976,  p.  245)  has  noted:  "Once  it  is  explicitly 
assumed  that  the  production  of  any  specific  instrumental  response  or  of 
some  of  Its  act  components  is  linked  to  one  or  more  particular  eliciting 
stimulus  configurations,  then  the  way  becomes  clear  for  interpreting 
instruniental  learning  in  terms  of  the  learning  of  stimulus-stimulus 
contingencies  alone.  The  problem  of  instrumental  training  then  becomes 
otie  of  iiiaking  certain  response-eliciting  stimuli  highly  poterit 
motivationally,  and  this  can  be  done  through  stimulus-stimulus 
contingency  leartiing  between  the  response-eliciting  stimulus  ond  the 
incentive  stimulus."  Research  on  autoshapiny  in  which  animals  :5liape 
choir  behavior  without  a  response-reintorcer  contingency  supports  this 
position  (Brown  and  Jenkins,  1968;  Jenkins  and  hoore,  19-3).  As 


expressed  by  Flaherty ,  Hamilton,  Gandelman,  and  Spear  (1977,  page  243), 
"the  law  of  effect  is  apparently  not  necessary  for  the  development  of 
instrumental-like  behavior." 

Another  way  of  viewing  Bindra's  theoretical  position  is  as  part  of  a 
movement  away  from  drive  reduction  theories  that  emphasize  internal 
deficit  signals  and  toward  incentive-iiiotivation  theories  (binora,  1968; 
Bolles,  1972).  Incentive-motivation  theories  suggest  that  "motivated 
behavior  results  not  only  from  the  ‘push*  of  internal,  deficit  signals 
but  also  from  the  'puli'  of  external,  incentive  stimuli"  (Mogenson  and 
Phillips,  1976,  p.  200,  emphasis  is  that  of  the  quoted  authors).  It  may 
be  noted  that  neuronal  drives,  as  defined  earlier  in  this  report,  include 
both  internal  deficit  signals  and  external  incentive  stimuli. 

While  finding  myself  in  sympathy  with  those  who  emphasize  that 
classical  conditioning  is  fundamental  to  learning,  1  do  not  want  to  go 
too  far  in  that  direction.  Miller  and  Balaz  (1981)  note  that  classical 
conditioning  has  often  been  seen  as  involving  the  learning  of 
stimulus-stimulus  associations  while  instrumental  conditioning  has  often 
been  seen  as  involving  the  learning  of  stimulus-response  associations  or, 
in  the  case  of  Mackintosh  (1974),  response-reinforcement  associations. 
Frequently  animal  leannng  theorists  have  chosen  one  particular  class  of 
associations  as  being  funaamental  and  then  have  ruled  out  other  classes 
of  associations.  Biridra  (1976  ,  1978),  for  example,  suggests  that 
learning  does  not  have  to  do  with  the  forming  of  stimulus-response 
associations  but  rather  witii  the  learning  of  contingencies  between 
stimuli,  Ihis  question  of  which  class  of  associations  is  f undaiiierita  1  to 


84 


learning  has  been  debated  by  animal  learning  theorists  for  decades.  The 
dnve-reinforcement  neuronal  model  suggests  that  it  may  not  be  necessary 
to  choose  one  class  of  associations  over  another,  Solomon  (1981,  p.  2) 


observes:  "One  persisting  question  is  'what  is  learned?'  The  four 

candidates  from  the  past  were  S-S  associations,  S-R  associations, 
R-reinforcer  associations  and  S-reinforcer  associations."  Solomon  goes 
on  to  say;  "It  appears  ...  that  associations  of  all  four  kinds  can  be 
established  with  the  right  procedures."  The  drive-reinforcement  model 
allows  for  all  four  possibilities,  suggesting  that  any  of  the  four 

classes  of  associations  will  form  when  neuronal  signals  representing 
stimuli,  responses,  and  reinforcers  occur  in  appropriate  temporal 
relationships.  If  a  stimulus,  response,  or  reinforcer  results  in  a 
positive  Ax^  that  is  followed  within  the  interval,  t  ,  by  another 
stimulus,  response,  or  reinforcer  that  yields  a  Ay  at  the  same  neuron, 
then  an  association  will  form.  Thus,  an  implicatioti  of  the 

drive-reinforcement  model  is  that,  at  a  neuronal  level,  classical 
conditioning,  instrumental  conditioning,  dri ve-reouction  and  induction, 
response-reinforcement,  and  incentive-motivation  theories  may  all 

describe  associations  that  can  form  in  the  nervous  system.  However,  it 
is  not  the  presence  of  signals  representing  stimuli,  reponses,  or 
reinforcers  that  is  required,  according  to  the  drive-reinforcement  model, 
but  rather  changes  in  signal  levels  representing  the  onsets  and  offsets 
of  stimuli,  reponses,  and  reinforcers. 


A  an V e-rfc1  nforceiiient  the ury  ot  learning 

Whdt  kind  of  theory  of  learning  is  implied  then  by  the 
dri ve-rei niurceiiient  neuronal  model?  At  this  point,  1  will  sketch  one 
possible  form  such  a  theory  might  take. 

Three  principles  would  appear  to  be  fundaiiiental  to  what  1  will  call 
a  drive-reinforcement  theory  of  learning: 

(1)  Primary  neuronal  drives  are  the  foundation  upon  which  all 
learning  rests. 

{Z)  Neuronal  reinforcers  are  changes  in  neuronal  drive  levels. 

Neuronal  drive  ir.duction  proniotes  learned  excitatory  processes. 

Neuronal  drive  reduction  proii.otes  learned  inhibitory  processes. 

Together,  these  processes  yield  acquired  drives  or  learning. 

(3)  The  neuronal  learning  tiiechanisin  correlates  earlier  changes  iii 
presynaptic  signals  with  later  changes  iri  postsynaptic  signals 
yielcing  changes  in  the  efficacy  of  synapses,  A  change  in  the 
efficacy  of  a  synapse  is  proportional  to  the  current  efficacy 
oi  the  synapse. 

If  these  priiKiples  should  turn  out  to  be  correct  at  a  neuronal 
level,  how  shoulfl  we  expect  such  mechanisms  to  manliest  at  the  level  of 
the  whole  animal  or  what  1  will  call  the  rietwor  k  levej_?  [(euronal  drives 
miunt  be  expected  to  eiiierqe  at  the  netvvot  !■  level  as  the  positive  and 

negative  leeabacl  Unrps  that  lor.ttul  behavior.  As  e,<aii.qles,  consider  a 

blood  glucose  utti-ctor  that  proviOes  ar.  iiiternal  unn.aiy  drive  signal 
(this  is  what  animal  learnir-j  s^cho  I  us  i  t  s  have  .  m  '  oina  t  i  i  ^  reterred  to 


OS  0  arive)  or  the  taste  of  food  that  provides  an  external  primary  drive 
signal  (what  animal  learning  psychologists  have  customarily  referred  to 
aS  an  unconditioned  stimulus).  These  primary  drive  signals  are  parts  ot 
innate  negative  feedback  loops  that  are  associated  with  what  are  termed 
the  hunger  drive  and  the  salivation  reflex.  These  feedback  loops  cause 
the  blood  glucose  level  to  rise  because  the  animal  is  driven  to  eat  and 
assist  in  causing  food  to  disappear  from  the  mouth  and  be  digested 
because  the  animal  is  driven  to  salivate.  More  generally,  feedback  loops 
representing  drives  include  mating  behavior,  drinking  behavior,  behaviors 
associated  with  the  approach  to  and  consumption  of  prey,  and  behaviors 
associated  with  the  attack  of  or  flight  from  predators.  In  general. 


behaviors  can  be  classified  as 


or  avoidance  (Mowrer,  19bU;.  wt 


[right  expect  approach  behavior  to  be  supported  by  positive  feedback  Ki,  ^  > 
and  avoidance  behavior  to  be  supported  by  negative  feedback  l,  , 
Positive  and  negative  feedback  loops  that  emerge  at  the  Itvt  i 
whole  animal  will  be  defined  to  be  network  drives,  as  disnt  ’• 
the  neuronal  drives  defined  earlier.  Neuronal  drives  ra,  [■ 
more  atomistic  basis  of  network  drives. 

Primary  network  drives  are  the  intialc 
Acqui red  network  drives  are  the  learned  yoa .  •  •• 

basis  of  the  hypothesized  dn ve-nM'/*.  t  •  • 
mechanism,  it  is  expected  that  avquir-  •  ■ 
constructed  on  top  of  the  prii.ct.  ■  -  • 
acquired)  drive  levels  vut  ,  ,  •  . 

re  1  nforcemeri t  arai  *  r  i  s  ■-  *  ■ 
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positive  and  negative  feedback  loops).  In  this  way,  the  process  of 
learning  is  hypothesized  to  be  sustained,  with  drives  being  built  on  top 
of  drives.  (Actually,  in  some  cases,  the  process  will  not  involve  the 
acquisition  of  new  drives  so  much  as  it  will  the  refinement  of  current 
drives.)  When  acquired  network  drives  become  sufficiently  complex, 
cognitive  phenomena  may  begin  to  emerge. 

To  support  the  process  of  drive  acquisition  or  learning  at  the 
network  level,  global  centers  that  can  broadcast  generalized  "start"  and 
"stop"  signals  may  be  helpful.  Such  signals  could  serve  to  introduce 
appropriate  Ay's  in  the  network  at  crucial  times,  thus  rendering  the 
overall  activity  of  the  network  coherent.  Such  may  be  the  roles  of  the 
global  reward  and  punishment  centers  discovered,  respectively,  by  Olds 
and  Milner  (1954)  and  by  Delgado,  Roberts  and  Miller  (1954).  Consistent 
with  this  idea,  global  reward  centers  appear  to  employ  inhibitory 
neurotransmitters  (Stein,  Wise,  and  Belluzzi,  1977)  that  may  function  as 
"stop"  signals  and  global  punishment  centers  appear  to  employ  excitatory 
neurotransmitters  (Stein,  Wise,  and  Belluzzi,  1977)  that  may  function  as 
"start"  signals.  That  a  rewaro  center  should  generate  "stop"  signals 
might  seem  paradoxical  with  respect  to  some  behaviors,  but  disinhibitory 
mechanisms  that  are  prevalent  in  the  nervous  system  (Roberts,  1980)  may 
make  such  an  approach  workable  by  enabling  releasing  mechanisms  to  be 
implemented  where  necessary.  It  should  also  be  noted  that  if 
reinforcers  are  changes  in  arive  levels,  then  global  drive  and 
reinforcement  centers  can  be  one  and  the  same.  A  center's  output  will 
constitute  a  drive  and  a  change  in  a  center's  output  will  constitute  a 
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ds  a  arive)  or  the  taste  of  food  that  provides  an  external  primary  drive 
signal  (what  animal  learning  psychologists  have  customarily  referred  to 
as  an  unconditioned  stimulus).  These  primary  arive  signals  are  parts  of 
innate  negative  feedback  loops  that  are  associated  with  what  are  termed 
the  hunger  drive  and  the  salivation  reflex.  These  feedback  loops  cause 
the  blood  glucose  level  to  rise  because  the  animal  is  driven  to  eat  and 
assist  in  causing  food  to  disappear  from  the  mouth  and  be  digested 
because  the  animal  is  driven  to  salivate.  More  generally,  feedback  loops 
representing  drives  include  mating  behavior,  drinking  behavior,  behaviors 
associated  with  the  approach  to  and  consumption  of  prey,  and  behaviors 
associated  with  the  attack  of  or  flight  from  predators.  In  general. 


behaviors  can  be  classified  as 


or  avoidance  (Mowrer,  196U).  We 


might  expect  approach  behavior  to  be  supported  by  positive  feedback  loops 
and  avoidance  behavior  to  be  supported  by  negative  feedback  loops. 
Positive  and  negative  feedback  loops  that  emerge  at  the  level  of  the 
whole  aninwl  will  be  defined  to  be  network  drives,  as  distinguished  from 
the  neuronal  drives  defined  earlier.  Neuronal  drives  may  be  seen  as  the 
more  atomistic  basis  of  network  drives. 

Pniiiary  network  drives  are  the  innate  goals  of  the  organism. 
Acqui red  network  drives  are  the  learned  goals  of  the  organism.  On  the 
basis  of  the  hypothesized  drive-reinforcement  neuronal  learning 
rriechanism,  it  is  expected  that  acquired  network  drives  are,  in  effect, 
constructed  on  tup  of  the  primary  network  drives.  When  primary  (and 
acquired)  drive  levels  vary,  these  variations  iri  arive  levels  constitute 
reinforcement  and  this  reinforcement  will  spawn  new  drives  (acquired 
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reinforcer.  Consistent  with  this  theoretical  possibility,  drive  and 
reinforcement  centers  in  the  limbic  system  and  hypothalamus  appear  to  be 
so  close  together  (Olds,  1977)  as  to  be,  perhaps,  identical. 


SECTION  5 


EXPERIMENTAL  TESTS 


In  the  computer  simulations  reported  above,  the  drive-reinforcement 
neuronal  model  has  been  demonstrated  to  be  consistent,  in  general,  with 
the  experimental  evidence  of  classical  conditioning.  However,  such  a 
demonstration  involves  comparing  theoretical  predictions  of  a  neuronal 
model  with  experimental  evidence  obtained  from  whole  animals.  To  some 
extent,  whole  animal  data  has  to  be  problematic  vis  a  vis  the  predictions 
of  a  neuronal  model.  The  effects  of  multiple  interacting  neurons  ,  the 
effects  of  the  brain's  many  interacting  subsystems  and,  in  general,  the 
effects  of  the  global  architecture  of  the  brain  will,  of  course, 
influence  whole  animal  data.  All  of  these  effects,  collectively,  I  will 
refer  to  as  network  effects  to  distinguish  them  from  neuronal  (meaning 
single  neuron)  effects.  Network  effects  will  preclude  rigorous 
experimental  tests  of  any  neuronal  model  in  terms  of  whole  animal  data. 
Tests  at  a  neurobiological  level  will  be  required.  Fortunately,  such 
experimental  tests  are  becoming  feasible  and,  indeed,  results  to  date 
encourage  the  notion  that  classical  conditioning  phenomena  may  manifest 
at  the  level  of  the  single  neuron,  as  the  drive-reinforcement  model 
suggests.  [See  reviews  by  Kandel  and  Spencer  (19t8),  Mpitsos,  Colli. is, 
and  McClellan  (1978),  Thompson,  Berger,  and  Madden  (1983),  Farley  and 
Alkon  (1985),  Woody  (1986),  Carew  and  Sahley  (1986),  and  Byrne  (1987). 
See  also  Hawkins  and  Kandel  (1984)  ana  Kelso  and  Brown  (1986).] 


Instrumental  conditioning  experiments  at  the  level  of  the  single  neuron 
are  also  becoming  feasible  (Stein  and  Belluzzi,  in  press). 


At  this  point,  perhaps  a  note  is  in  order  regarding  the  semantics  1 
am  adopting.  When  I  suggest  that  a  single  neuron  may  manifest  classical 
conditioning  phenomena,  the  "single  neuron"  I  am  referring  to  includes 
the  synapses  that  impinge  upon  it.  Those  synapses,  of  course,  come  from 
other  neurons  or  from  sensory  receptors  and,  in  that  sense,  what  I  am 
referring  to  as  a  phenomenon  involving  a  "single  neuron"  is,  in  fact,  a 
multi  neuron  or  neuron  and  receptor  phenomenon.  The  point,  though,  is 
that  a  single  neuron  may  be  unoergoing  the  conditioning,  as  distinguished 
from  alternative  theoretical  models  that  can  be  envisioned  in  which  whole 
circuits  consisting  of  many  neurons  would  be  the  lowest  level  at  which 
conditioning  could  occur.  An  implication  of  the  drive-reinforcement 
neuronal  model  is  that  classical  conditioning  is  not  an  emergent 
phenomenon  but,  rather,  that  the  ability  to  undergo  classical 
conditioning  is  a  fundamental  property  of  single  cells. 

Actually,  the  hypothesized  drive-reinforcement  learning  mechanism 
could  be  implemented  at  a  lower  level  than  that  of  the  single  neuron. 
Minimally,  what  would  seem  to  be  required  would  be  two  synapses 
interacting  such  that  one  synapse  would  deliver  the  signal  corresponding 
to  Ax^(t-j),  reflecting  the  onset  of  the  CS,  and  the  other  synapse  would 
deliver  the  signal  corresponding  to  ^y(t),  reflecting  the  onset  or 
offset  of  the  US.  Evidence  of  such  interactions  between  synapses  has 
been  obtained  in  investigations  of  classical  conditioning  in  Aplysia. 
The  learning  mechanism  appears  to  involve  what  is  termed 


activity-dependent  amplification  of  presynaptic  facilitation  (Hawkins, 
Abrams,  Carew,  and  Kandel ,  1983)  or  activity-dependent  neuromodulation 
(Walters  and  Byrne,  1983)  of  sensory  neuron  terminals.  The  optimal 
interstimulus  interval  between  activation  of  the  sensory  neuron  terminal 
represeriting  the  CS  and  activation  of  the  facilitator  neuron  terminal 
representing  the  US  has  been  found  to  be  about  500  ms  (Carew,  Walters, 
and  Kandel,  1981;  Hawkins,  Carew,  and  Kandel,  1986).  While  the  evidence 
for  conditioning  at  a  neuronal  level  in  Aplysia  has  been  interpreted  as 
suggesting  a  presynaptic  learning  mechanism,  Farley  and  Alkon  (1985) 
indicate  that  the  sites  of  the  changes  may  r.  )t  be  exclusively 
presynaptic. 

Whether  presynaptic  or  postsynaptic  processes  or  both  underlie 
learning  is  a  question  that  has  been  investigated  theoretically  (lipser, 
1986)  and  experimentally  (Carew,  Hawkins,  Abrams,  and  Kandel,  1984).  In 
this  report,  I  have  formulated  the  drive-reinforcement  learning  mechanism 
in  terms  of  postsynaptic  processes  although, as  discussed  above,  the 

learning  mechanism  could  be  implemented  in  an  exclusively  presynaptic 
form.  Apart  from  activity-dependent  amplification  of  presynaptic 
facilitation  or  activity-dependent  neuromodulation  offering  a  possible 
implementation  of  the  drive-reinforcement  learning  mechanism,  other 
possibilities  can  be  envisioned  that  would  still  involve  less  than  a 
whole  neuron.  Portions  of  dendritic  trees  and  their  impinging  synapses 

might  function  in  a  manner  analogous  to  the  model  I  have  envisioned  for 

the  whole  neuron.  Thus  there  are  a  range  of  possibilities  for 

implementation  of  the  drive-reinforcement  learning  mechanism,  extending 


from  what  is  perhaps  a  mimnial  two-synapse  interaction  on  the  low  end 
ranging  through  portions  of  dendritic  trees  functioning  as  a  basic  unit 
of  learning,  up  through  the  level  at  which  a  single  neuron  functions  as 
the  basic  unit  and  beyond  to  the  point  where  the  whole  organism  is 
treated  as  a  single  unit.  Variations  of  the  orive-reinforcement  model 
nay  have  relevance  at  each  of  these  levels,  even  though  the  learning 
mechanism  seems  to  lend  itself  naturally  to  implementation  at  a  neuronal 
level . 

Regarding  the  questiori  of  how  the  drive-reinforcement  model  can  be 
tested  at  a  neuronal  level,  synaptic  inputs  will  have  to  be  controlled 
and  monitored  precisely  for  a  single  neuron  while  the  neuron's  frequency 
of  firing  is  continually  monitored.  It  will  be  necessary  to  measure  the 
direction  and  preferably  also  the  magnitude  of  the  changes  in  efficacy  of 
affected  synapses.  Changes  in  synaptic  inputs,  as  potential  CSs,  ana 
changes  in  neuronal  outputs,  representing  potential  reinforcement,  will 
have  to  be  tested  to  determine  which,  if  any,  input  and  output  patterns 
yield  changes  in  the  efficacy  of  synapses.  In  this  way,  it  can  be 
established  whether  onsets  and  offsets  of  hypothesized  neuronal  Cbs  and 
USs  determine  the  efficacy  of  synapses  in  the  manner  specified  by  the 
drive-reinforcement  model. 

Experimental  evidence  that  bears  on  this  question  of  neuronal 
learning  mechanisms  has  been  obtained  from  studies  involving  the 
phenomenon  of  long-term  potentiation  (LTP).  The  results  have  been 
interpreted  to  suggest  that  neurons  are  Hebbian  in  character  with 
respect  to  their  learning  mechanisms  (Levy,  1985;  Levy  and  Desmond,  1985; 


Kelso,  Ganong  and  Brown,  1986).  However,  the  relationship  of  the 
phenomenon  of  LTP  to  learning  is  unclear  at  this  time  (Morris  and  Baker, 
1984).  As  Bliss  and  Lomo  (1973,  p.  355)  point  out  in  the  article  in 
which  they  reported  their  discovery  of  LTP;  "Whether  or  not  the  intact 
animal  makes  use  in  real  life  of  a  property  which  has  been  revealed  by 
synchronous,  repetitive  volleys  to  a  population  of  fibres  the  normal  rate 
and  pattern  along  which  are  unknown,  is  another  matter." 

Recent  experimental  results  involving  LTP  suggest  that  sequential 
neuronal  inputs  may  be  more  efficacious  than  simultaneous  inputs  in 
causing  synaptic  weight  changes  to  occur.  Larson  and  Lynch  (1986)  have 
shown  that  brief  high  frequency  pulse  trains  delivered  to  nonoverlapping 
sets  of  synapses  of  hippocampal  neurons  are  most  effective  in  inducing 
LTP  if  the  pulse  train  to  a  first  set  of  synapses  precedes  a  pulse  train 
to  a  second  set  by  200  milliseconds.  With  this  experimental  procedure, 
LTP  is  induced  only  in  the  second  set  of  synapses.  LTP  is  not  induced  in 
either  set  of  synapses  if  the  delay  is  reduced  to  zero  or  extended  to  two 
seconds. 

Recently,  long-term  depression  (LTD)  of  parallel  fiber  test 
responses  after  conjunctive  stimulation  of  parallel  and  climbing  fiber 
inputs  has  been  demonstrated  in  the  cerebellum  (Ito,  Sakurai ,  and 
Tongroach,  1982;  Ito,  1986).  However,  the  relationship  of  this 
phenomenon  to  classical  conditioning  is  unclear  at  this  time  because,  as 
Byrne  (1987,  p.  411)  notes:  "Activation  of  parallel  fiber  input  during 
the  period  between  20  ms  prior  and  150  ms  after  climbing  fiber 
stimulation  were  roughly  equivalent  in  inducing  LTD  [Ekerot  and  Kano, 


cited  in  Ito,  1984'J.  This  indicates  that  the  neural  analog  of  the  US 
(climbing  fiber  input)  can  induce  a  change  in  the  neural  analog  of  the  CS 
(parallel  fiber  input)  even  if  the  CS  occurs  after  the  US.  Therefore  the 

intrinsic  mechanism  could  support  backwara  conditioning,  a  phenotnenon 

that  is  not  observed  with  behavioral  conditioning." 

Additional  experimental  evidence  relevant  to  the  question  of 
neuronal  learning  mechanisms  has  been  obtained  by  Baranyi  and  Feher 
(1978,  1981  a,  b,  c)  who  have  attempted  to  classically  condition 
pyramidal  neurons  in  the  cat's  motor  cortex.  CSs  in  the  form  of 

presynaptic  activity  were  paired  with  USs  in  the  form  of  postsynaptic 
cell  firing.  Evidence  of  conditioning  was  obtained  in  the  form  of 
enhanced  EPSPs,  with  the  enhancement  being  sustained  for  up  to  41 
minutes.  The  relationship  of  these  experimental  results  to  classical 

conditioning  phenomena  remains  to  be  demonstrated,  however,  because 
evidence  of  conditioning  was  obtained  for  interstimulus  intervals  ranging 
from  0  to  400  ms  and  for  either  forward  or  backward  pairing  of  the  CS  and 
US. 

In  summary,  Baudry  (1987,  p.  168),  in  a  group  report  from  a  Dahlem 
Workshop,  offereo  ttiis  assessment  of  some  of  the  experimental  evioence 
discussed  above:  "For  discrete  stimulus-response  learning  (i.e., 

skeletal  muscle  responses),  no  learning  occurs  with  backward  (UCS  first) 
or  simultaneous  onset  or  in  fact  until  the  CS  precedes  the  UCS  by  nearly 
100  ms.  Learning  is  best  with  intervals  from  cOO  to  400  ms  and  decreases 
as  the  interval  is  lengthened  further.  In  terms  of  current  models,  the 
Aplysia  system  seems  to  follow  this  function  remarkably  well  and  this 
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seems  also  to  be  the  case  for  Hermissenda  [Lederhendler  and  Alkon,  1986], 


It  is  not  yet  clear  how  LTP  ana  LTD  could  satisfy  this  function  although 


the  newly  described  paradigm  to  obtain  LIP  [Larson  and  Lynch,  1986j  also 


seems  to  follow  this  temporal  specificity." 


SECTION  6 


DISCUSSION 

The  learning  mechanism  underlying  nervous  system  function  (if, 
indeed,  there  is  a  single  basic  mechanism)  may  not  be  of  the  character 
suggested  by  the  Hebbian  neuronal  niodel.  The  Hebbian  model  suggests  that 
approximately  simultaneous  neuronal  signals  are  associated.  The 
drive-reinforcement  neuronal  model,  on  the  other  hand,  suggests  that 
sequential  changes  in  neuronal  signals  are  associated.  An  implication  of 
the  drive-reinforcement  model  is  that  nervous  systems,  in  effect,  pay 
attention  to  changes,  encoding  causal  relationships  between  changes  as 
the  basis  for  learning. 

Besides  psychology  and  neuroscience,  several  other  disciplines  have 
been  addressing  questions  related  to  learning.  These  disciplines  include 
(a)  the  cybernetical ly  oriented  efforts  referred  to  as  connectionist  or 
neural  network  modeling,  (b)  artificial  intelligence  research,  and  (c) 
adaptive  control  theory  and  adaptive  signal  processing.  In  this  section, 
the  implications  of  the  dnve-reinforcement  neuronal  model  for  each  of 
these  approaches  will  be  considered. 


Connectionist  and  neural  network  model inc 


For  a  few  decades  now,  neural  network  models,  or  what  are  sometimes 
more  generally  referred  to  as  connectionist  models,  have  been  proposed  as 
theoretical  models  of  nervous  system  function.  Connectionist  models  have 


also  been  proposed  as  engineering  solutions  to  problems,  without  any 
claim  of  biological  relevance.  In  either  case,  with  or  without  the  claim 


of  biological  relevance,  the  thrust  of  connectionist  modeling  has  been  to 
address  the  issues  of  memory,  learning  and  intelligence  by  means  of 
cybernetical ly  oriented  designs  for  massively  parallel  systems  (Hinton 
and  Anderson,  1981;  Grossberg,  1982,  1987;  Klopf,  1982;  Levine,  1983; 
Kohonen,  1984;  Barto,  1985;  Feldman,  1985;  Runielhart  and  McClelland, 
1986;  McClelland  and  Rumelhart,  1986). 

In  recent  years,  several  approaches  to  connectionist  modeling  have 
come  to  the  fore,  these  approaches  appearing  to  have  promise  in  terms  of 
solving  the  problem  of  accomplishing  learning  in  large,  oeep  networks. 
The  ultimate  potential  of  these  approaches  cannot  be  assessed  yet  because 
efforts  to  scale  up  the  respective  connectionist  networks  are  only 
beginning.  What  can  be  done  at  this  point  and  what  I  will  attempt  to  do 
here  is  to  assess  some  of  the  approaches  for  their  relevance  to  animal 
learning. 

One  dimension  along  which  connectionist  models  may  be  assessed  has 
to  do  with  the  nature  of  the  feedback  the  models  require  froni  their 
environments.  Some  connectionist  models  operate  in  a  strictly  open  loop 
fashion,  requiring  no  feedback  from  their  environment.  An  example  is  the 
connectionist  model  due  to  Fukushima  (1980,  1982).  Fukushinia's  network, 
when  presented  with  spatial  patterns,  adjusts  connection  weights  so  that 
the  patterns  tend  to  cluster  in  useful  ways,  for  some  purposes  of  pattern 
classification.  No  feedback  from  the  environment  is  given  or  required. 
One  question  that  arises  is  whether  networks  operating  in  this  way,  in  an 
open  loop  or  nongoal-seeking  fashion,  can  be  relevant  to  biological 
information  processirig.  An  implication  of  the  drive-reinforcement 


neuronal  model  and  of  the  learning  theory  implied  by  the  model  is  that 
feedback  loops  through  the  environment  are  a  fundamental  part  of 
biological  information  processing.  In  biological  systems,  it  appears 
that  positive  and  negative  feedback  loops,  constituting  drives,  support 
goal-seeking  and  that  the  changes  in  the  levels  of  activity  of  these 
closed  loops  or  drives  constitute  reinforcement. 

Nearest  neighbor  classifications  of  spatial  patterns,  like  that 
accomplished  with  Fukushima's  clustering  technique,  can  also  be 
accomplished  with  Boltzmann  machines  (Hinton,  Sejnowski,  and  Ackley, 
1984;  Ackley,  Hinton,  and  Sejnowski,  1985;  Hinton  and  Sejnowski,  1986) 
and  what  are  sometimes  called  Hopfield  networks  (Hopfield,  1982;  Cohen 
and  Grossberg,  1983;  Hopfield,  1984;  Hopfield  and  Tank,  1985,  1986; 
Tesauro,  1986).  These  latter  two  classes  of  connectionist  models,  having 
been  inspired  by  theoretical  models  in  physics,  utilize  symmetric 
connections  and  what  may  be  called  adaptive  equilibrium  processes  in 
which  the  networks  settle  into  minimal  energy  states.  The  networks  have 
been  demonstrated  to  have  interesting  and  potentially  useful  properties 
including,  for  example,  in  the  case  of  Hopfield  networks,  solving  analogs 
of  the  traveling  salesman  problem.  However,  symmetric  network 
connections  and  adaptive  equilibrium  processes  have  not  yet  been 
demonstrated  to  be  relevant  to  the  modeling  of  nervous  system  function, 
at  least  with  regard  to  the  underlying  learning  mechanisms.  It  may  be 
noted  that  a  wide  range  of  classical  conditioning  phenomena  are  predicted 
by  the  drive-reinforcement  neuronal  model  and  it  uses  no  symmetric 
connections  or  adaptive  equilibrium  processes.  Hhat  the 


drive-reinforcement  neuronal  model  does  utilize  is  the  real-time 
operation  of  drives  and  reinforcers  that  can  be  understood  in  terms  of  a 
network's  ongoing,  closed  loop  interactions  with  its  environnient. 

Continuing  to  look  at  connectionist  models  in  terms  of  the  nature  of 
the  feedback  they  require  from  their  environment,  a  class  of  models  that 
niight  be  considered  the  other  extreme  from  open  loop  models  are  those 
using  supervised  learning  mechanisms.  Such  network  moaels  require 
detailed  feedback  in  the  form  of  an  error  signal  indicating  the 
difference  between  a  desired  output  and  the  network's  actual  output. 
Rosenblatt  (1962),  Widrow  (1962),  and  subsequently  many  others  have 
investigated  connectionist  models  utilizing  supervised  learning 
mechanisms.  Fur  these  network  models,  error  signals  play  no  role  in  a 
theoretical  neuron's  computations  relative  to  its  input-output 
relationship,  their  only  role  being  to  instruct  the  neuron  with 
regard  to  the  modification  of  its  synaptic  weights.  Supervisea 
learning  mechanisms  introduce  the  need  for  a  "teacher"  to  provide 
a  learning  system  with  desired  responses.  In  contrast,  the 
drive-reinforcement  neuronal  model,  like  some  other  real-time 
learning  mechanisms,  does  not  require  the  introduction  of  a  teacher  and, 
thus,  is  an  example  of  an  unsupervised  learning  mechanism.  In  the  case 
of  the  arive-reinforcement  rieuronal  model,  fixeo  (nonplastic)  synapses 
mediating  USs  function  like  an  internal  teacher  or  goal  specification. 

It  should  be  noted  that  unsupervised  learning  mechanisms  have 
sometimes  been  associated  with  systems  that  operate  in  an  open  loop  mode 
with  respect  to  their  environment.  Unsupervised  learning  mechanisms  have 


also  been  associated  with  clustering  techniques  as  an  approach  to  pattern 
recognition.  However,  as  defined  here,  unsupervised  learning  mechanisms 
represent  that  class  of  learning  mechanisms  that  do  not  require  a  teacher 
external  to  the  learning  system  and,  thus,  they  may  be  utilizec  in 
learning  systems  that  operate  either  in  an  open  or  closed  loop  mode  with 
respect  to  their  environment. 

The  distinction  between  unsupervised  learning  mechanisms  that  do  not 
require  a  teacher  and  supervised  learning  mechanisms  that  do  require  a 
teacher  would  appear  to  be  of  fundamental  importance.  While  supervised 
learning  mechanisms  may  have  a  role  to  play  in  artificial  intelligence, 
it  would  seem  that  only  unsupervised  learning  mechanisms  are  likely  to  be 
relevant  to  the  modeling  of  natural  intelligence.  In  general,  biological 
systems  accomplish  learning  without  a  teacher  being  present  in  any 
explicit  sense.  Of  course,  a  biological  system's  environment  always 
functions  as  a  teacher  in  an  implicit  sense  but  that  is  exactly  what 
real-time  unsupervised  learning  mechanisms  can  take  into  account,  as 
could  be  seen  in  the  results  of  the  computer  simulations  of  the 
drive-reinforcement  neuronal  model  presented  earlier. 

One  qualification  is  in  order  regarding  the  role  of  supervised 
learning  mechanisms  in  natural  intelligence.  It  is  clear  that  something 
like  supervised  learning  mechanisms  play  a  large  part  in  natural 
intelligence  at  higher,  cognitive  levels.  At  such  levels,  explicit 
teachers  play  an  important  role.  However,  1  suggest  that  this  has  misled 
neural  network  modelers,  causing  them  to  introduce  supervised  learning 
mechanisms  at  a  fundamental  level.  It  is  this  hypothesized  fundamental 


role  fur  supervised  learning  mechanisms  that  I  think  is  unlikely  to  be 
valid  in  the  case  of  neural  network  or  connectionist  models,  if  the 
models  are  intended  to  be  relevant  for  natural  intelligence. 

Regarding  connectionist  models  that  employ  supervised  learning 
mechanisms,  the  most  promising  recent  form  of  this  class  of  models  is  due 
to  Werbos  (1974),  Parker  (1982,  1985),  Le  Cun  (1985),  and  Rumelhart, 
Hinton  and  Williams  (1985,  1986).  They  have  proposed  mechanisms  for 
propagating  error  signals  from  the  output  layer  back  to  the  input  layer 
of  a  network.  The  performance  of  the  resulting  networks  has  been 
encouraging  and,  therefore,  the  question  arises  of  whether  these 
connectionist  models  may  be  relevant  to  the  understanding  of  animal 
learning.  Such  relevance  seems  unlikely  for  two  reasons  that,  in  part,  I 
have  already  discussed.  First,  animals  do  not  receive  error  signals 
during  learning  except,  in  the  case  of  humans,  after  a  fairly  high  level 
of  cognitive  function  has  been  achieved.  Second,  the  drive-reinforcement 
neuronal  model  demonstrates  that,  at  least  for  classical  conditioning 
phenomena  that  appear  to  be  fundamental  to  learning,  back  propagating 
error  correction  mechanisms  are  not  required. 

Recognizing  that  animal  learning  does  not,  in  general,  involve 
evaluative  feedback  from  the  environment,  some  investigators  have  moved 
away  from  supervised  learning  in  which  error  signals  must  be  provided  to 
the  learning  system.  A  step  in  the  direction  of  unsupervised  learning  is 
reinforcement  learning  (Farley  and  Clark,  1954;  Minsky,  1954;  Barto  and 
Sutton,  1981e;,  1981b;  Sutton,  1984;  Barto  and  Anandan,  1985;  Barto  and 
Anderson,  1985,)  or  what  Widrow,  Gupta,  and  Maitra  (1973)  have  called 
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learning  with  a  critic.  Williams  (1986,  1987)  notes  that  in  this  type  of 
learning  the  network  may  be  provided  with  performance  feedback  as  simple 
as  a  scalar  signal,  termed  reinforcement,  that  indicates  the  network's 
degree  of  success.  Reinforcement  learning  networks  have  been 
demonstrated  to  be  workable  (e.g.,  see  Barto,  Sutton,  ana  Anderson, 
1983),  at  least  in  the  case  of  small  scale  versions.  Furthermore, 
reinforcement  learning  networks  appear  more  likely  to  be  biologically 
relevant  than  supervised  learning  networks  because  less  evaluative 
feedback  is  required  from  the  environment.  However,  an  iniplication  of 
the  drive-reinforcement  neuronal  model  is  that  environmental  feedback 
does  not  come  in  the  form  of  reinforcement  but,  rather,  comes  in  the  form 
of  changes  in  drive  levels.  Biological  systems  appear  to  compute  their 
own  reinforcement  by  utilizing  learning  mechanisms  that  compare  current 
and  recent  drive  levels.  In  this  way,  a  dnve-reinforcement  learning 
mechanism  requires  no  evaluative  feedback  from  the  environment.  The 
environment  simply  provides  sensory  input,  some  of  which  functions  as 
drives,  and  when  the  drive  levels  change,  it  is  hypothesized  that  neurons 
and  nervous  systems  treat  these  changes  in  drive  levels  as  reinforcement. 

Having  used  the  expression,  "evaluative  feedback,"  1  should  define 
it.  By  evaluative  feedback,  I  mean  any  kind  of  signal  that  requires  the 
environment  (actually,  a  "teacher"  or  "trainer"  in  the  environment)  to 
make  some  judgment  about  the  performance  of  the  learning  system  that  is 
receiving  the  feedback.  In  an  extreme  case,  that  could  mean  the  teacher 
or  trainer  would  have  to  know  the  desired  response  and  would  then  inform 
the  learning  system  of  the  direction  and  magnitude  of  its  error.  In  a 


V4  at 4  ^4 


less  extreme  case,  the  teacher  or  trainer  could  utilize  implicit  or 


explicit  criteria  to  form  judgments  about  whether  the  learning  system's 
performance  was  improving  or  not  and  then  signal  these  evaluations  of 


relative  levels  of  performance  to  the  learning  system.  Nonevaluati ve 


feedback,  then,  is  any  signal  a  learning  system  can  generate  for  itself. 


without  the  aid  of  a  teacher  or  trainer,  simply  by  having  an  appropriate 


sensor  with  which  to  detect  events  in  the  environment. 


Whether  feedback  conies  to  a  learning  system  in  the  form  of  drives. 


reinforcers,  or  error  signals  has  relevance  with  regard  to  two  further 


questions:  What  should  constitute  the  innate  knowledge  in  a  learning 


system  and  what  form  should  the  innate  knowledge  take?  A  reinforcement 


or  a  supervised  learning  system  will,  innately,  know  how  to  utilize 


reinforcement  signals  or  error  signals  to  discover  appropriate  drives.  A 


drive-reinforcement  learning  system,  on  the  other  hano,  will  begin  with 


some  primary  drives  in  place  and  will  then  acquire  additional  drives. 


utilizing  changes  in  the  current  drives  as  reinforcers.  Biological 


systems  appear  to  take  this  latter  approach,  beginning  with  some  primary 


or  innate  drives  and  then  building  acquireo  drives  on  top  of  them. 


This  approach  may  offer  a  solution  to  a  fundamental  problem  in 


connectionist  modeling.  A  basic  question  has  been  that  of  how  the 


network  elements  or  neurons  in  a  large,  deep,  multilayered  network  can 


learn  to  respond  properly  without  direct  feedback  from  a  teacher 


informing  them  of  what  their  correct  responses  shoulo  have  been  at  each 


step  along  the  way.  The  answer  suggested  by  drive-reintorcenient  learning 


theory,  as  outlined  earlier,  is  to  utilize  whatever  network  drives 
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(feedback  loops)  are  already  in  place  anci  then  treat  changes  in  drive 
levels  as  reinforcers.  In  this  way,  reinforcement  signals  are  always 
available  locally  (i.e.,  changes  in  neuronal  drive  levels  can  be  computed 
locally)  and,  thus,  there  would  appear  to  be  no  requirement  for  a 
teacher,  trainer  or  crmc  at  any  level  in  the  network.  (This  does  not 
preclude  the  eventual  evolution,  at  higher  levels  in  a  neural  network,  of 
global  reinforcement  centers  that  could  aid  the  process  of  learning  by 
providing  overall  direction.)  Additional  theoretical  work  including 
computer  simulations  of  large,  deep  networks  will  be  required  to  test 
this  idea  that  drive-reinforcement  learning  ii,echanisms  will  enable 
multilayered  networks  to  learn  to  model  their  environment  appropriately 
without  evaluative  feedback  from  the  environment. 

Having  examined  the  kinds  of  environmental  feedback  required  by 
various  classes  of  connectlonist  models,  let  us  now  consider  the  related 
question  of  what  kirios  of  goals  are  implemented  in  these  networks.  In 
supervised  learning  systems,  the  goal  is  to  minimize  the  error  signal. 
In  reinforcement  learning  systems,  the  goal  may  be  to  maximize  a  scalar 
associated  with  the  reiriforcement  function.  In  ori  ve-reinforcement 
learning  systems,  the  goal  may  be  to  reduce  drives  although,  as  discussed 
in  an  earlier  section  of  this  report,  biological  systems  don't  always 
appear  to  be  reducing  drives  and,  even  if  they  are,  the  behavioral 
manifestations  tan  be  subtle  and  complex.  Some  of  the  subtleties  and 
complexities  may  be  due  to  global  reinforcement  centers  arising  in 
nervous  systems  at  the  level  of  the  limbic  system  and  hypothalamus.  Such 
global  reinforcement  centers  may,  in  part,  be  responsible  for  certain 
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theorists  proposing  reinforcement  learning  systems  as  models  of  nervous 
system  function.  At  a  still  higher  level  of  nervous  system  function, 
cognitive  processing  appears  to  have  motivated  the  introduction  of 
supervised  learning  systems  as  theoretical  models.  From  this 
perspective,  we  see  that  the  drive-reinforcement  learning  niechanisni  might 
reflect  the  neuronal  level  of  nervous  system  function,  with  reinforcement 
and  supervised  learning  mechanisms  reflecting  progressively  higher  levels 
of  function.  It  would  seem  then  that  it  is  important  to  be  clear  about 
what  level  of  nervous  system  function  one  is  modeling.  Furthermore, 
modeling  higher  levels  of  nervous  system  function  may  require  taking  into 
account  the  nature  of  the  learning  mechanisms  that  operate  at  lower 
levels. 

Regarding  drive  reduction  as  the  possible  goal  of  biological  systems 
and,  perhaps,  as  the  goal  o1  drive-reinforcement  networks,  one  point 
that  should  be  made  is  that  drive  reduction  would  seem  to  be  the  goal  for 
drives  that  are  implemented  as  negative  feedback  loops .  Drives 
implemented  as  positive  feedback  loops  woulo  seem  to  support  the  goal  of 
drive  induction  rather  than  drive  reduction.  Having  said  this,  it  may 
then  be  observed  that,  in  the  case  of  biological  systems,  drive 
induction,  as  in  the  pursuit  of  prey,  always  seems  to  be  followed  by 
drive  reduction,  as  in  the  consumption  of  prey.  This  may  suggest  a 
simple  gene'^al  principle  for  the  design  ^or  evolution)  of 
drive-reinforcement  networks;  primary  drives  implemented  as  positive 
feedback  loops  should  always  lead,  when  activated,  to  the  subsequent 
activation  of  primary  drives  that  are  implementea  as  negative  feedback 


loops.  If  this  principle  is  followed,  then  dll  dnves  will,  ultinicitel>', 
support  the  goal  of  drive  reduction.  This  ii.ay  help  to  insure  the 
stdbility  of  learning  systenis. 


i  have  traversed  the  conceptual  or  theoretical  territory  of 
connection! St  nodels  twice  now,  once  looking  at  the  kinds  oi  feedback 
various  classes  of  models  require  i  rom  their  environments  and  once 
looking  at  the  nature  of  the  goals  impleinented  in  these  UiCdels,  I  want 
to  niake  one  more  pass,  examining  the  dlgorithniic  or  heuristic  character 
of  various  conneciionist  models. 

Supervised  learning  mechanisms,  in  their  most  receiit  form,  in  which 
back  propagation  ceci'niques  arc  utilizeo,  have  a  certair.  appeal  because 
(if  wl:at  I  would  suggest  is  their  nearly  algorithmic  character.  I  mean 
this  in  the  mathematical  sense  in  which  an  algor ithm  is  definec  to  be  a 
procedure  that  is  ticrcinteed  to  produce  a  result,  as  distinguished  from  a 
heuristic  that,  like  a  rule  of  thumb,  may  or  iiiay  not  produce  the  desired 
outcome.  Lack  propaqating  error  correctior  learning  iiiecnanisms  utilize 
gradient  oescent  techniqiies  such  that  they  prc'Vide,  with  some  allovvanres 
for-  the  problem  of  getting  tuny  up  ori  local  peaks,  cr.  uptiiiral  solution  t(j 
the  problem  cuitfroniing  the  network,  the  problem  being  to  arrive  ai  it'e 
tiesc  set  of  ccnnectuiii  wc-igtits.  back  propagating  orror  correction 
networks  becoiu  of  interest,  then,  itoi'  a  theoretical  stanopoitit, 
irrespeciive  ot  their  biologucl  telev&nce,  becausi'  *l't  iiiodets  ria} 
represent  optimal  or  near  opLiiiial  solutions  of  crrtain  prol'leins.  Ever 
here,  there  i.ay  be  di  tt -i  cul  ties  itiogh,  because  for  ttu-  Ktger,  deeper 
networks  inariy  tttor'ists  are  rtiteresieu  in,  scaling  up  ut  back  prciayating 


error  correction  networks  tiiay  pose  an  obstacle  (Plant,  Nowloi, ,  and 
Hintort,  1986). 

At  any  rate,  if  we  consider  that  back  propayating  error  correctior. 
networks  have  something  of  ar*  algorithmic  character,  the  other  extreme 
might  be  connectiornst  networks  that  utilize  random  search  techniques  to 
identity  reasonable  values  for  tl.e  connection  weights  (e.g.,  see  Barron, 
1968).  Random  search  techniques  would  seem  to  be  about  as  far  removed 
from  an  algorithmic  character  as  a  learning  mechanism  can  get. 

In  between  these  two  extremes  are  such  classes  of  rnodels  as 
reinforcement  and  drive-reinforcemetit  learning  mechanisms  that  appear  to 
have  a  heuristic  character.  For  example,  utilizing  drives  and 
reinforcers  as  the  basis  for  learning  may  not  guarantee  correct  results 
but,  on  the  average,  such  an  approach  to  learning  appears  to  be 
effective  in  the  case  of  biological  systems. 

Artificial  intelligerce 

Fundatiiental  to  the  process  of  learning  in  Iht  case  of  the 
dri ve-reiiiforcement  neuronal  moctl  is  the  temporal  shaping  of  behavior. 
This  is  it;  contrast  to  the  kinds  (;f  processes  that  occur  in  ortiticial 
Intel  1 igence  where  the  emphasis  is  placed  on  what  might  be  called 
cognitive  searching.  "Logriitive"  because  there  is  an  emphasis  or,  the 
rational  and  syiTibolic  aspects  of  intelligence  ano  'searching"  because 
there  is  an  emphasis  on  selecting  fron,  a  Urge  nuiiiber  of  possible 
Ltheviors.  An  implication  of  the  drlvt-reiniot cei  lent  neuronal  model  is 
that,  funoarriental  1}  ,  r.atural  intelligerce  ar.u  the  learning  mechanisins 


that  support  it  do  not  involve  symbols  or  searching  but,  rather,  uotions 
and  shapir.g.  Learned  behavior  is  gradually  shaped  through  experience  to 
become  more  appropriate.  This  dynaroio  process  yields  associations  that 
refine  behavior  that  is  alreaoy  in  place.  Animals  are  continually 
"riding"  a  large  number  of  feedback  loops  that  reach  through  tlie  animal 
anci  out  into  the  environment.  The  more  cognitively  or  symbolically 
oriented  kinds  of  starching  through  large  liumbers  of  possibilities  that 
humans  sometimes  engage  in  is,  most  likely,  an  emergent  phenomenon  that 
arises  out  of  the  internalization  of  a  very  large  number  of  causal 
relations,  this  internalization  being  accomplished,  it  would  seem,  with 
something  like  a  drive-reinfo'^cement  learning  mechanism  that  ternporally 
refines  actions.  Another  way  of  saying  this  is  that  first  v/e  learn  tC' 
grasp  an  object  ana  then  we  learn  to  grasp  a  problem. 

The  comments  I  am  making  regarding  artificial  intelligei'ce  research 
apply  as  well,  1  leel,  to  cognitive  science.  There  seems  to  be  tlie  view 
in  both  of  these  disciplines  that  memory,  learning  and  intelligence  have 
to  do,  fundamentally,  with  cognition.  however,  doesn't  natural 
intelligence  have  to  oo  vnth  action,  emotion ,  ana  cogriition?  The 
cr ive-reinforcemerit  neuronal  model  contains  what  may  be  a  complete  set  ol 
the  funoanental  elements  that  unoerlie  intelligence,  r.amely,  outputs  that 
reflect  actioriS,  inputs  and  changes  ir.  inputs  that  reflect  drives  and 
reinforoers,  synaptic  weights  that  represent  knowledge,  and  changes  in 
synaptic  weights  that  represent  learning.  The  seeds  of  action,  emotion, 
and  cognitioi.  appear  to  be  present  in  the  drive-reinforcement  neuronal 
model . 
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in  such  areas  of  artificial  intelligence  research  as  image 
understanding  and  the  related  area  o1  pattern  recognition  ^altliough  the 
latter  is  sometimes  more  closely  associated  with  connectionist  models 
than  with  mainstream  artificial  intelligence),  the  tendency  has  been  to 
treat  the  temporal  aspects  of  intelligent  informolion  processing  as  too 
difficult  for  current  techniques  to  handle.  (Some  recent  research 
constitutes  exceptions  to  this  statement.)  Often,  ways  have  been  sought 
to  autori.atically  understanc  static  scenes  or  to  recognize  spatial 
patterns.  The  temporal  aspects  of  natural  intelligence,  associated  with 
motion  and  associated  with  real-time  information  processing,  in  general, 
have  frequently  not  been  addressed  in  image  understar'cing  and  pattern 
recognition  research,  the  strategy  seeming  to  be  that  these  ditficult 
issues  will  be  addressed  later,  when  these  fields  c^f  research  are  more 
advanced.  But  if  the  temporal  and,  indeed,  real-time  aspects  of  riatural 
intelligence  turn  out  to  be  funcamental  with  regard  to  learning,  as  the 
drive-reinforceii.ent  neuronal  model  suggests,  could  it  be  that  the  goals 
of  image  understanding  arid  pattern  recognition  research  will  be  more 
easily  achieved  if  the  temporal  or  real-time  aspects  of  intelligent 
infurr.iation  processing  are  confronted  first  rather  ttian  last? 

hoving  discussed  cognitive  searching  and  its  role  ir.  artificial 
intelligence,  it  may  be  useful  at  this  point  to  comment  on  evolutiuriary 
inootls  of  learning  because  such  nodels  also  invoke  search  mechanisms  in  a 
fundamental  way.  Fogel ,  Owens,  and  Walsh  (1966),  Klupf  and  Gose  (1969), 
and  hollarid  (  1975),  for  exaiiiple,  have  proposed  evolutionary  models  of 
learning  in  which  olternative  structures  or  behaviors  are  ijeneroceu 


randomly  or  by  sonie  process  that  is  mure  systematic  than  a  pur('‘'y  landon. 
one.  Ih.eri,  the  alternatives  arc  evaluated  and  the  best  are  saved.  Sucli 
an  evolutionary  process  appears  to  be'  funOdmentally  differtti  rrom  a 
learniruj  process.  Fundamenta'' ly ,  learning  aoes  not  appear  to  involve 
generating  and  eva‘!udting  alternatives.  Rcther,  as  aiscusseri  earlier, 
learning  appears  to  involve  the  direct  ten^poral  slioping  uf  behavior. 
Experienced  causal  relationships  are  inter nal izec;  i.e.,  associations  are 
formed  oirectly  as  a  result  of  the  experience.  For  example,  when  a  bell 
rings  and  fuoc  lollows,  animals  furn:  associations  directly.  No  search 
process  occurs.  Of  course,  at  a  higher  level,  searching  can  be 
occurring.  It  can  be  seen  that  if  an  animal  is  exploring  its  envi runniei<t 
and  causes  a  bell  to  ring  and  then  food  follows,  the  consequences  of  the 
exploratory  or  searcfi  process  may  result  in  ttie  cirect  temporal  shapirig 
of  behavior.  Direct  teinporal  shaping  of  behavior  rmy  be  occurnng  theii 
at  the  most  fundameiital  level  and  a  search  process  may  be  occurring  at  o 
higher  level. 

In  sunirar^,  it  could  be  said  that  an  implication  of  the 
drive-rtiiiforceihetit  nodel  is  that  time  is  the  teacher  (that  is  to  scy, 
real-time)  and  behavior  tir  actions  is  what  is  taught,  niiiiialtly,  in  a 
phyloger.fctically  advanced  orgar.ism  like  a  huruan,  knowledge  acqui  si  t  i  ers 
representati cri ,  and  utilization  betome  important  too  and  thtri  a  urucoss 
like  the  one  I  am  calling  cogrnlive  searching  takes  ui;  iirriasinr 
importance.  However,  it  seems  that  tins  ii'o^  have  misled  artituiol 
it'tel  1  igence  researchers  and  cognitive  scierilists,  drawing  their 


attention  away  from  the  unaerlyinc  mechanisms  that  appear  to  havt'  i  toe  to 


GO  with  temporal  sfiaping.  Artificial  inttlligence  researcheri-.  liave,  for 
exdii.pl e,  sometimes  been  aismayed  by  the  lack  of  coriiii.oti  sense  in  the 
systems  they  have  designed.  Could  it  be  tfiat  common  sense  derives  fron’ 
the  operation  of  drives  anci  reinforcers  and  troii.  the  kind  of  real-time 
embedding  in  the  environment  that  is  characlcnstic  of  biological 
sy  sterns? 


Adaptive  control  theory  and  adaptive  signal  processing 

For  several  decades  now,  control  theory  has  beeti  successfully 
applied  to  the  problems  of  analyzing  and  synthesizing  automatic  control 
systerns.  Adaptive  cor.trol  theory  seeks  to  extend  control  system 
applications  to  cases  in  which  adaptation  or  learning  is  required  on  the 
part  of  tlie  automatic  controller  (e.g.,  see  Chalam,  198/).  In  this  way, 
control  theory  contacts  the  probleri  of  learning  in  the  conte>a  of 
engineering  applications. 

Related  to  the  subject  of  adaptive  control  theory  is  adaptive  signal 
processing  (e.g.,  see  Widrow  and  Stearns,  1988;.  In  both  adaptive 
control  and  adaptive  signal  processing,  it  is  soiiietimes  assumed  that  a 
"desired  response"  or  "training  signal"  is  available  with  which  the 
controller's  or  signal  processor's  actual  output  can  be  compared  for  tfie 
purpose  of  learning.  Drivu-reinforcenient  learning  theory,  as  outlined 
earlier,  suggests  an  alternative  way  to  extend  control  theory  or  sigrial 
processing  techniques  for  the  case  of  learning,  such  that  no  knowlecge  of 
a  desired  response  or  training  signal  is  required  when  the  leaning 
system  is  operating. 


In  the  dri ve-rcinforcemetit  learning  theory  outlitied  earlier,  network 
driven  are  fundamental.  In  control  theory,  negative  teeriback  loops  are 
fundamental,  tint  rietwork  drives,  as  I  have  defined  them,  and  neyai-ive 
feedback  loops  are  one  and  the  same  thing.  (One  qualification:  in 
biological  systems,  network  drives  may  also  occasionally  he  positive 
feedback  loops.)  One  sees  that  dri ve-reinforcti, lent  theory  ana  control 
theory  start  on  the  some  basis.  It  can  then  be  seen  that 
drive-reinforcement  theory  suggests  a  "natural"  learnirig  nechanism  for 
control  and  signal  processing  systeras.  While  1  am  not  aware  of  any 
adaptive  control  or  signal  processing  systems  using  lagged  derivatives  of 
inputs  and  outputs  as  a  basis  for  adaptation,  such  a  learning  niechanisni 
would  Seem  to  constitute  a  straiglitforward  extension  of  conventional 
control  systen.  ahd  signal  processing  techniques. 

The  essence  of  the  drive-reinforcement  learning  mechanism,  in 
adaptive  control  theoretic  terms,  can  be  simply  stated.  A  network  of 
drivt-reinforcement  neurot.s,  viewed  as  a  control  system,  will  interact 
with  its  environment  through  some  set  of  positive  and  negative  feeaback 
loops.  Pursuit  of  prey,  tor  example,  may  involve  positive  feedback 
loops,  as  noted  earlier,  and  avoidance  of  predators  niay  involve  negative 
feedback  loops.  At  any  given  time,  a  biological  system  will  be 
ititerdcting  with  its  envi roninent  through  a  set  of  actual  positive  ar.o 
negative  feedback  loops  that  constitute  its  current  primary  and  acquireo 
drives  and  through  a  set  of  potential  positive  and  neyalive  feedback 
loops  that  coristitute  possible  lulure  acquired  drives.  Poteritial 
acquired  drives  will  become  actual  i1  the  inputs  tor  the  [loUi.tial  drives 


become  uctive  no  n'ore  than  t  time  steps  before  any  of  the  torrent  uctual 
drives  chariyt  their  levels  of  activity.  In  this  way,  what  may  be  called 
a  drive-reinforcement  controller-  will  learn  to  control  its  output  not 
only  to  deliver  more  or  less  of  a  control  signal  (as  current  adaptive 
controllers  do)  but  also  to  deliver  the  control  signal  sooner  or  later. 
That  is  to  say,  a  drive-reinforcement  controller  would  be  expected  to 
modify  not  only  the  amplituccs  of  its  responses  but  also  the  timing. 
Memory  and  learning 

Before  conducing  this  discussion  of  some  of  the  implications  of  the 
drive- reinforcement  neurotial  model,  a  few  words  should  be  said  about 
memory  and  how  it  relates  to  lea rrmy.  As  Squire  (1986)  notes,  in 
phylogenetically  old  animals  such  as  Invertebrates,  what  is  learned  takes 
the  form  of  procedural  memories.  In  phylogenetically  recent  animals  such 
as  mar.imals,  what  is  learnea  can  also  take  the  form  of  declarative 
memories.  Tlie  aistinction  between  procedural  and  declarative  memories  is 
that  between  skills  and  procedures,  on  the  one  hand,  and  specific  facts 


and  data,  on  the  other. 

The  drive-reinforcement  learning  mechanism  appears  to  be  well  suited 
for  the  laying  down  oi  procedural  memories  because  the  learning  mechonism 
treats  time  as  a  fundamental  dimension,  utilirir.g  time  derivatives  of  the 
neuronal  inputs  and  outputs  and  correlating  the  derivatives  across  a 
temporal  interval.  If  ttie  drive-reinforcement  learning  mechanism  should 
turn  out  to  be  the  learning  mechanism  for  acquiring  procedural  memories, 
could  it  also  turn  out  to  be  the  learning  mechanism  for  acquiring 
declarative  memories?  To  see  how  this  could  be  a  possibility,  it  may  be 


necessary  to  consider  the  Interaction  of  the  brain's  attention  mechanism 
with  the  registration  of  sensory  and  other  information  in  the  cerebral 
cortex.  The  medial  tempoial  cortex  and  especially  the  hippocampal 
formation  and  associated  areas  appear  to  be  important  with  respect  to 
declarative  memories.  Squire  (1986)  notes  that  the  capacity  for 
declarative  memories  reaches  its  greatest  developnent  in  ruammals  in  which 
these  cortical  structures  are  most  fully  elaborated.  Given  our  tendency 
to  remember  that  to  which  we  attend,  might  it  be  that  signals  generateo 
by  the  attention  mechanism,  the  signals  originating  perhaps  in  the 
thalamic  reticular  formotion  (Klopf,  1982),  interact  with  sensory  arid 
other  information  registering  in  the  medial  temporal  cortex,  such  that 
the  temporal  relationships  specified  by  the  dnve-reinforcement  learning 
mechanism  are  satisfied  ana  aeclarative  memories  result?  In  general, 
could  the  role  of  the  attention  mechanism  in  the  laying  down  of  both 
procedural  ana  oeclarative  memories  be  the  induction  of  Ay's  at 
appropriate  times  relative  to  ax's  so  that  the  resulting  synaptic  weight 
changes  yield  learning? 


SECTION  7 


CONCLUDING  REMARKS 

In  the  Foreword  to  Olds'  (197/ j  book  on  Drives  and  Reinforcements, 
Neal  Miller  remarks  (p.  v):  "A  fundamental  step  in  the  line  of  evolution 
leading  to  human  behavior  was  the  development  of  learning,  a  new  process 
of  adaptation  that  could  occur  far  more  rapidly  within  the  lifetime  of 
the  inaividual  instead  of  slowly  during  the  evolution  of  the  species.  In 
determining  which  particular  response  will  be  performed  and  learned,  the 
selective  factor  is  reinforcement  which,  in  turn,  is  closely  related  to 
the  drives  that  are  active  at  a  given  time."  In  this  report,  i  have 
attempted  to  relate  drives  and  reinforcers  by  means  of  a  theoretical 
model  of  neuronal  function.  The  model  has  been  demonstrated  to  predict  a 
wide  range  of  classical  conditioning  phenomena.  Implications  of  the 
model  have  been  consiaered  for  the  fields  of  animal  learning  theory, 
connectionist  and  neural  network  modeling,  artificial  intelligence 
research,  adaptive  control  theory,  and  adaptive  signal  processing.  It 
has  been  concluded  that  a  real-time  learning  mechanism  that  does  not 
require  evaluative  feedback  from  the  environment  may  be  fundamental  to 
natural  intelligence  and  that  such  a  learning  mechanism  may  have 
implications  for  artificial  intelligence. 

In  addition  to  accomplishing  experimental  tests  of  the  neuronal 
model ,  a  useful  next  step  may  be  to  simulate  networks  of  the  proposed 
theoretical  neurons  to  determine  the  properties  ot  the  networks,  in 
general,  and,  in  particular,  to  determine  it  instrumental  conditioning 


phenomena  emerge.  Actually,  in  pursuing  this  theoreticcil  Vkork,  it  may  be 
useful  to  simulate  not  only  the  neural  network  but  also  a  simplified 
organism  controlled  by  the  neural  network  and  a  simplified  erivironment 
with  which  the  organism  is  interacting.  [See  Barto  and  Sutton  (1981b) 
for  an  example  of  how  this  kind  of  simulation  can  be  carried  iiot.j  By 
means  of  such  computer  simulations  of  nervous  systems,  organisms,  and 
environments,  it  may  become  possible  to  make  behavioral  observations  on  a 
mathematically  well  defined  network  of  drive-reinforcement  neurons  during 
the  process  of  learning. 
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APHtNDIX;  Parameter  Specifications  for  the  Coihputer  Simulations 
of  the  Neuronal  Kodels 

Dri ve-t einforcement  model 

Learning  rate  constants:  c^=5.0,  c^=3.0,  c^=l.b,  c^=0.75,  C|^=0.25(  x  =S} 

CS  initial  synaptic  weight  values  [i-o.,  w^(t)  at  t=0]:  +0.1  (excitatory 

weights),  -0.1  (inhibitory  weights).  Exceptions:  For  the 

sin.ulations  reported  in  Figures  12  and  18,  the  initial  values 
of  the  inhibitory  synaptic  weights  were  0.0,  thus  preventing 
the  inhibitory  weights  from  changing  during  these  simulations. 

This  was  done  to  simplify  the  graphs  and  to  focus  attention  on 
the  excitatory  weights  that  were  priinatily  responsible  for  the 
phenomena  being  manifested.  Haa  the  initial  inhibitory 
synaptic  weight  values  for  Figures  12  and  18  been  set  at  -0.1, 
as  was  aone  for  the  other  simulations,  small  changes  iri 
inhibitory  weights  woulo  have  been  observed  at  some  points  in 
these  simulations  while  the  overall  piienoritena  being  manifested 
would  have  remained  unchanged. 

US  (noriplastic)  synaptic  weight  values;  +1.0  (excitatory  weight)  and  0.0 
(inhibitory  weight). 

lower  bound  on  synaptic  weights:  |w^(t)|>  0.1 

Neuronal  output  limits:  0.0  i  y(t)  i  i.O 

Neuronal  tiireshold:  9  =0.0 

CS  amplitucjes  (measured  relative  to  zero-level  baseline):  0.2  except  for 
Figure  7  where  the  amplitudes  v<ere  1.0,  0.5,  and  O.zo  for  CS^ , 
CS^,  and  CS^,  respectively,  and  Figure  17  where  the  amplituoes 
were  0.2,  0.2,  and  0.4  for  CS, ,  CS„,  ana  CS_,  respectively. 


US  amplitudes  (nitcisured  relative  to  zero-level  baseline):  O.b  e;<cept  for 
Figure  b  where  the  US  amplitudes  were  1.0,  0,5,  and  0.?5  for  the 
USs  occurring  in  conjunction  with  CS^,  CS^,  and  CS^, 
respectively. 

CS  and  US  timing:  See  Table  1  for  times  of  onset  and  offset  of  CSs  and 
USs  within  a  trial.  Also  specified  in  Table  1  are  the  trials 
during  which  each  CS  and  US  was  present.  For  all  of  the  CS-US 
configurations,  the  time  of  onset  of  the  first  stimulus  was 
arbitrarily  chosen  to  be  10.  Onset  of  a  stimulus  at  time  step, 
t,  means  that  the  stimulus  was  on  during  time  step,  t,  and  was 
not  on  during  the  previous  time  step.  Offset  of  a  stimulus  at 
time  step,  t,  means  that  the  stimulus  was  off  durif'g  time  step, 
t,  and  was  not  off  during  the  previous  time  step. 

hebbian  model 

Uhere  applicable,  parameter  values  were  the  same  as  for  the 

drive-reinforcement  model  except  that  c=0.fa,  tiie  initial  synaptic  weight 
Values  were  0.0,  and  there  was  no  lower  bound  on  the  synaptic  weights. 

Sutton-barto  model 

Where  applicable,  parameter  values  were  the  same  as  fur  the 

drive- rei nforcement  model  except  that  c=0.6, a  =U.bi,  the  initial  synaptic 
weight  values  were  U.U,  and  there  was  no  lower  bound  on  the  synaptic 


conti nues 


(table  continues; 


Figure  CS  and  US  timing  (time  step  of  onset/time  step  of  offset/trials  during  which 

i.V^  Number  stimulus  was  present) 


Figure  CS  dnc  US  timing  (time  step  ot  onset/time  step  of  offset/tnals  during  which 

Number  snmulus  was  present) 


CS  and  US  timing  (titiie  step  of  onset/time  step  of  offset/ 
Number  stimulus  was  present) 


